galaxy-dev December 2008

galaxy-dev@lists.galaxyproject.org

6 participants
55 discussions

Adding the 'name' content of the selection parameter
by Assaf Gordon 15 Jan '09

15 Jan '09

Hello, Continuing the discussion from Dec. 19th, http://mail.bx.psu.edu/pipermail/galaxy-user/2008-December/000422.html The following hack adds an additional key/value which contains the 'name' of the selected option in a <select> input parameter. With this hack, the following tool configuration is possible (note the output/label option): ========================== <tool> .. <inputs> <input format="fasta" type="data" name="input"> <input type="select" name="database" label="Database"> <option value="/home/gordon/long/path/hg18.fa">Human (hg18)</option> <option value="/home/gordon/long/path/dm3.fa">Fly (dm3)</option> <option value="/home/gordon/long/path/mm9.fa">Mouse (mm9)</option> </input> </inputs> <outputs>  <data format="txt" name="output" label="Blat $input1 on $database" />  <data format="txt" name="output" label="Blat $input1 on $database_name" /> </outputs> ========================== Currently, using "$database" in the output/label part will put the *value* of the selected option (e.g. "/home/gordon/long/path/hg18.fa") in the dataset's name. This doesn't really help the user understand what the dataset contains. With this hack, for each <select> input, a new variable is created (with the "_name" suffix) which will contain the *name* of the selected option (e.g. "Human (hg18)"). For example, if the user selected the second option, Then "$database" will contain "/home/gordon/long/path/dm3.fa", and "$database_name" will contain "Fly (dm3)". I'm well aware this is an ugly hack. A better way would be to have each parameter (in the incoming dictionary) be an object (e.g. ToolParameter) and not a string. Then we could write something like "$database.name" or "$database.value". The hack contains two patches. The first add support for 'value_to_display_text' with dynamic options, in "Parameters/Basic.py": ============================================================== --- galaxy_prod/lib/galaxy/tools/parameters/basic.py +++ galaxy_devel/lib/galaxy/tools/parameters/basic.py @@ -576,7 +576,7 @@ elif len( value ) == 1: value = value[0] return value - def value_to_display_text( self, value, app ): + def value_to_display_text( self, value, app, other_values={} ): if isinstance( value, UnvalidatedValue ): suffix = "\n(value not yet validated)" value = value.value @@ -584,10 +584,11 @@ suffix = "" if not isinstance( value, list ): value = [ value ] - # FIXME: Currently only translating values back to labels if they - # are not dynamic if self.is_dynamic: - rval = map( str, value ) + rval = [ ] + for name in self.options.get_options ( app , other_values ) : + if ( name[1] in value ): + rval.append ( name[0] ) else: options = list( self.static_options ) rval = [] ============================================================== The second patch iterates the tool's parameters, and for each <select> type parameter, adds the additional key/value for the selected option: ============================================================== --- galaxy_prod/lib/galaxy/tools/actions/__init__.py +++ galaxy_devel/lib/galaxy/tools/actions/__init__.py @@ -96,6 +96,18 @@ on_text = "" # Add the dbkey to the incoming parameters incoming[ "dbkey" ] = input_dbkey + + ##28dec2008, gordon + ## For every 'Select' parameter, we get only the 'value' part of the selection from the HTTP request. + ## The following code gets the 'name' part for the selected option + selection_parameter_names = {} + for param_name, param_selected_value in incoming.iteritems(): + param_obj = tool.get_param ( param_name ) + if isinstance( param_obj, basic.SelectToolParameter): + param_selected_name = param_obj.value_to_display_text( param_selected_value, trans, incoming ) + selection_parameter_names [ param_name + "_name" ] = param_selected_name.replace("\n","") + incoming.update( selection_parameter_names ) + # Keep track of parent / child relationships, we'll create all the # datasets first, then create the associations parent_to_child_pairs = [] ============================================================== I tried to make the changes as non intrusive as possible. I hope the added 'other_value' parameter doesn't affect other places that call 'value_to_display_text' - but I'm not quite sure about it (I'm still very new to python). I've tested this hack with several tools, and it seems to work (or at least not crash horribly). Comments are welcomed, Gordon.

2 1

request: add newlines to dataset's info field
by Assaf Gordon 15 Jan '09

15 Jan '09

Hello, I'd like to request/suggest a tiny change in the dataset's info field. Whenever a tool outputs some lines into STDOUT, they are displayed in the dataset's info field. Currently, if the info field contains several lines, they are displayed as a single line (because HTML ignores new lines). The following patch causes the info field to be displayed correctly. ================================================================= --- galaxy_prod/lib/galaxy/datatypes/data.py +++ galaxy_devel/lib/galaxy/datatypes/data.py @@ -132,7 +132,7 @@ def display_info(self, dataset): """Returns formated html of dataset info""" try: - return escape(dataset.info) + return escape(dataset.info).replace("\n", "<br>") except: return "info unavailable" def validate(self, dataset): ================================================================== Thanks, gordon.

2 1

Showing dataset state in History List View
by Assaf Gordon 13 Jan '09

13 Jan '09

Hello, I'd like to suggest a new feature: In the history list view, instead of just showing the 'size' of the history (= how many datasets are in the history) - show the state of each dataset. The reason is that some users run long jobs (or workflows with many steps) in several histories in parallel - and they want to quickly know: 1. which jobs are running, 2. which jobs are completed, 3. Were there any errors Currently, they have to switch to each history, and look at the state of the datasets. With this feature, all one needs to do is look at the history view. Using the same color keys for ok/queued/running/error states, users can quickly know: 1. If there's a grey box - some jobs are still queued. 2. If there are no grey boxes but some yellow boxes - some jobs are still running. 3. if there are no grey boxes and no yellow boxes - all jobs have been completed. 4. If there are red boxes - some jobs failed. Attached pictures illustrate the feature (at different states of jobs). To add this feature: Extract the attached 'list.mako.tar.gz' to GALAXY/templates/history (overriding the current list.mako). You'll also need to add the following function to GALAXY/lib/galaxy/model/__init__.py class History, line ~399 # returns number of datasets matching the requested state # Added by gordon, 24dec2008 def get_dataset_count( self, state ): count = 0 for data in self.datasets: if data.state == state and not data.deleted : count += 1 return count Another feature in this list.mako is the separation of 'switch to' link from the 'delete' and 'rename' links - The 'switch to' is much more important and frequently-used than the other two, and users have been complaining about the difficulty of clicking it (and accidentally clicking delete or rename). Comments are welcome, Gordon.

2 1

Storing/Peeking/Downloading compressed files
by Assaf Gordon 26 Dec '08

26 Dec '08

Hello, I'd like to request/suggest a feature: Semi-Transparent support for compressed files. The feature requires four (tiny) patches (detailed below). With this feature, dataset files (/database/files/NNN/dataset_NNNN.dat) can be stored compressed, and their content will be automatically 'peeked' in the preview window. Additionally, when a user clicks 'save' or 'eye icon', they will be uncompressed on-the-fly - so the user doesn't need to know/care they are compressed. Of course, there's the whole issue of making the different tools read and write compressed files - but that's another story. It's actually not too complicated story: In Python, just call gzip.open instead of open. In shell scripts, pipe the input file through "zcat -f FILE | program". In Perl, use PerlIO::Gzip module. Comments are welcomed, Regards, Gordon. First Patch - Adding a function to "util" module, which returns a Gzip/Bzip2/Zip File object (or a plain File object) based on the file type. File type detection is done using the 'magic' module - I think it is quite standard (in ubuntu I got it with "apt-get install python-magic"). However, to get galaxy to find this module I had to remove the "-ES" from "run.sh" - I'm sure there's a better way to do it. ==================================================================== --- ./lib/galaxy/util/__init__.orig.py 2008-12-26 23:48:40.000000000 -0500 +++ ./lib/galaxy/util/__init__.py 2008-12-27 00:31:44.000000000 -0500 @@ -14,11 +14,41 @@ from galaxy.util.docutils_ext.htmlfrag i pkg_resources.require( 'elementtree' ) from elementtree import ElementTree +import magic # file detection +import gzip # allow peeking into compressed files +import bz2 +import zipfile + log = logging.getLogger(__name__) _lock = threading.RLock() gzip_magic = '\037\213' +# Magic file detection +magic_file = magic.open(magic.MAGIC_MIME) +try: + magic_file.load() +except: + magic_file = None + +def open_file_wrapper(filename): + file_mime = "" + if magic_file is not None: + try: + file_mime = magic_file.file(filename) + except: + file_mime = "" + if file_mime == "application/x-gzip": + return gzip.open(filename) + if file_mime == "application/x-bzip2": + return bz2.BZ2File(filename) + if file_mime == "appication/x-zip": + return zipfile.ZipFile(filename) + + #for all other mime types, return the raw file + return file(filename) + + def synchronized(func): """This wrapper will serialize access to 'func' to a single thread. Use it as a decorator.""" def caller(*params, **kparams): ==================================================================== Second Patch - In the 'display' action of the root web controller, return the file with the appropriate wrapper ==================================================================== --- ./lib/galaxy/web/controllers/root_orig.py 2008-12-26 23:56:01.000000000 -0500 +++ ./lib/galaxy/web/controllers/root.py 2008-12-27 00:35:43.000000000 -0500 @@ -153,7 +153,7 @@ class RootController( BaseController ): m1 = trans.app.memory_usage.memory( m0, pretty=True ) log.info( "End of root/display, memory used increased by %s" % m1 ) try: - return open( data.file_name ) + return util.open_file_wrapper( data.file_name ) except: return "This dataset contains no content" else: ==================================================================== Third patch - In the BaseController object, allow streaming on compressed files (not just types.FileTypes): ==================================================================== --- ./lib/galaxy/web/framework/base_orig.py 2008-12-27 00:41:38.000000000 -0500 +++ ./lib/galaxy/web/framework/base.py 2008-12-27 00:41:37.000000000 -0500 @@ -25,6 +25,11 @@ from paste.response import HeaderDict # For FieldStorage import cgi +# For auto-decompressing files +import gzip +import bz2 +import zipfile + log = logging.getLogger( __name__ ) class WebApplication( object ): @@ -133,7 +138,7 @@ class WebApplication( object ): if callable( body ): # Assume the callable is another WSGI application to run return body( environ, start_response ) - elif isinstance( body, types.FileType ): + elif isinstance( body, (types.FileType, gzip.GzipFile, bz2.BZ2File, zipfile.ZipFile) ): # Stream the file back to the browser return send_file( start_response, trans, body ) else: ==================================================================== Fourth Patch - In the generic Data datatype object, replace the file object with a compressed file object in the peek function: ==================================================================== --- ./lib/galaxy/datatypes/data.py 2008-12-26 23:34:15.000000000 -0500 +++ ./lib/galaxy/datatypes/data_orig.py 2008-12-26 23:21:41.000000000 -0500 @@ -332,7 +332,7 @@ def get_file_peek( file_name, WIDTH=256, count = 0 file_type = '' data_checked = False - for line in util.open_file_wrapper( file_name ): + for line in file( file_name ): line = line[ :WIDTH ] if not data_checked and line: data_checked = True ====================================================================

1 0

Downloading entire history as tar.gz archive
by Assaf Gordon 24 Dec '08

24 Dec '08

Hello, Did you ever wish you could download all the datasets in the current history as TAR.GZ file ? I know I did... The attached file will allow you to do that (assuming you have your own local galaxy server). Installation ------------- 1. put the attached file (history_exporter.py) in the galaxy directory, in: [GALAXY]/lib/galaxy/web/controllers 2. Install GNU TAR version 1.20 Most linux distributions don't yet have tar 1.20, but it is required for this module to work... You'll probably have to install it from source: http://www.gnu.org/software/tar/ 3. Change line 82 (tar_exe variable) in 'history_exporter.py' to point to the path of the new tar. Usage ------ Reload galaxy and switch to a history with some datasets, then browse to: http://YOUR-GALAXY-URL/history_exporter/export This will start a download of a tar.gz file containing all the datasets in the current history (properly renamed, not as 'dataset_XXXX'). Additionally, a README.txt file is added to the tarball, describing each dataset. To see how the tar file is created, go to: http://YOUR-GALAXY-URL/history_exporter/debug Comments are welcome, Happy Holidays, Gordon.

1 0

[hg] galaxy 1686: Since tempfiles seem to occasionally be left b...
by Nate Coraor 23 Dec '08

23 Dec '08

details: http://www.bx.psu.edu/hg/galaxy/rev/696fc4c02a0c changeset: 1686:696fc4c02a0c user: Nate Coraor <nate(a)bx.psu.edu> date: Tue Dec 23 13:28:27 2008 -0500 description: Since tempfiles seem to occasionally be left behind, allow logging of open tempfiles, including a traceback (to determine callers) if LOG_TEMPFILES is set in the environment when Galaxy starts. This should be considered temporary and will be removed when it's determined where/how this happens. 2 file(s) affected in this change: lib/log_tempfile.py scripts/paster.py diffs (46 lines): diff -r 82886ba9323b -r 696fc4c02a0c lib/log_tempfile.py --- /dev/null Thu Jan 01 00:00:00 1970 +0000 +++ b/lib/log_tempfile.py Tue Dec 23 13:28:27 2008 -0500 @@ -0,0 +1,27 @@ +# override tempfile methods for debugging + +import tempfile, traceback + +import logging +log = logging.getLogger( __name__ ) + +class TempFile( object ): + def __init__( self ): + tempfile._NamedTemporaryFile = tempfile.NamedTemporaryFile + tempfile._mkstemp = tempfile.mkstemp + tempfile.NamedTemporaryFile = self.NamedTemporaryFile + tempfile.mkstemp = self.mkstemp + def NamedTemporaryFile( self, *args, **kwargs ): + f = tempfile._NamedTemporaryFile( *args, **kwargs ) + try: + log.debug( ( "Opened tempfile %s with NamedTemporaryFile:\n" % f.name ) + "".join( traceback.format_stack() ) ) + except AttributeError: + pass + return f + def mkstemp( self, *args, **kwargs ): + f = tempfile._mkstemp( *args, **kwargs ) + try: + log.debug( ( "Opened tempfile %s with mkstemp:\n" % f[1] ) + "".join( traceback.format_stack() ) ) + except TypeError: + pass + return f diff -r 82886ba9323b -r 696fc4c02a0c scripts/paster.py --- a/scripts/paster.py Mon Dec 22 13:35:01 2008 -0500 +++ b/scripts/paster.py Tue Dec 23 13:28:27 2008 -0500 @@ -16,6 +16,11 @@ from galaxy import eggs import pkg_resources +if 'LOG_TEMPFILES' in os.environ: + from log_tempfile import TempFile + _log_tempfile = TempFile() + import tempfile + pkg_resources.require( "PasteScript" ) from paste.script import command

1 0

[hg] galaxy 1684: Modified sort options in grouping tool to work...
by Nate Coraor 23 Dec '08

23 Dec '08

details: http://www.bx.psu.edu/hg/galaxy/rev/c1d3004f0613 changeset: 1684:c1d3004f0613 user: guru date: Mon Dec 22 12:15:02 2008 -0500 description: Modified sort options in grouping tool to work correctly on older and newer versions of unix sort. Also included functional tests. 4 file(s) affected in this change: test-data/groupby_out1.dat test-data/groupby_out2.dat tools/stats/grouping.py tools/stats/grouping.xml diffs (126 lines): diff -r 96f2c4630e62 -r c1d3004f0613 test-data/groupby_out1.dat --- a/test-data/groupby_out1.dat Fri Dec 19 14:25:43 2008 -0500 +++ b/test-data/groupby_out1.dat Mon Dec 22 12:15:02 2008 -0500 @@ -1,21 +1,20 @@ -chr10 55251623.000000 -chr11 87588756.250000 -chr1 148052568.250000 -chr12 38440094.000000 -chr13 112381694.000000 -chr14 98710240.000000 -chr15 41666442.500000 -chr16 206638.000000 -chr18 50562378.250000 -chr19 59226196.750000 -chr20 33504194.750000 -chr2 118341365.500000 -chr21 33160676.750000 -chr2 220209905.500000 -chr22 30471242.250000 -chr5 131612441.500000 -chr6 108564320.750000 -chr7 115958079.000000 -chr8 118881131.000000 -chr9 128842832.750000 -chrX 145194871.500000 +chr1 1.48053e+08 +chr10 5.52516e+07 +chr11 8.75888e+07 +chr12 3.84401e+07 +chr13 1.12382e+08 +chr14 9.87102e+07 +chr15 4.16664e+07 +chr16 206638 +chr18 5.05624e+07 +chr19 5.92262e+07 +chr2 1.69276e+08 +chr20 3.35042e+07 +chr21 3.31607e+07 +chr22 3.04712e+07 +chr5 1.31612e+08 +chr6 1.08564e+08 +chr7 1.15958e+08 +chr8 1.18881e+08 +chr9 1.28843e+08 +chrX 1.45195e+08 diff -r 96f2c4630e62 -r c1d3004f0613 test-data/groupby_out2.dat --- a/test-data/groupby_out2.dat Fri Dec 19 14:25:43 2008 -0500 +++ b/test-data/groupby_out2.dat Mon Dec 22 12:15:02 2008 -0500 @@ -1,2 +1,2 @@ -chr10 1700.00 ['NM_11', 'NM_10', 'test'] -chr22 1533.33 ['NM_17', 'NM_19', 'NM_18'] \ No newline at end of file +chr10 1700 +chr22 1533.33 \ No newline at end of file diff -r 96f2c4630e62 -r c1d3004f0613 tools/stats/grouping.py --- a/tools/stats/grouping.py Fri Dec 19 14:25:43 2008 -0500 +++ b/tools/stats/grouping.py Mon Dec 22 12:15:02 2008 -0500 @@ -69,8 +69,9 @@ start a key at POS1, end it at POS2 (origin 1) In other words, column positions start at 1 rather than 0, so we need to add 1 to group_col. + if POS2 is not specified, the newer versions of sort will consider the entire line for sorting. To prevent this, we set POS2=POS1. """ - command_line = "sort -f -k " + str(group_col+1) + " -o " + tmpfile.name + " " + inputfile + command_line = "sort -f -k " + str(group_col+1) +"," + str(group_col+1) + " -o " + tmpfile.name + " " + inputfile except Exception, exc: stop_err( 'Initialization error -> %s' %str(exc) ) diff -r 96f2c4630e62 -r c1d3004f0613 tools/stats/grouping.xml --- a/tools/stats/grouping.xml Fri Dec 19 14:25:43 2008 -0500 +++ b/tools/stats/grouping.xml Mon Dec 22 12:15:02 2008 -0500 @@ -1,4 +1,4 @@ -<tool id="Grouping1" name="Group" version="1.3.0"> +<tool id="Grouping1" name="Group" version="1.4.0"> <description>data by a column and perform aggregate operation on other columns.</description> <command interpreter="python"> grouping.py @@ -38,28 +38,26 @@ <requirements> <requirement type="python-module">rpy</requirement> </requirements> - <tests> -  -  + <test> + <param name="input1" value="1.bed"/> + <param name="groupcol" value="1"/> + <param name="optype" value="mean"/> + <param name="opcol" value="2"/> + <param name="opround" value="no"/> + <output name="out_file1" file="groupby_out1.dat"/> + </test> + +  + <test> + <param name="input1" value="1.tabular"/> + <param name="groupcol" value="1"/> + <param name="optype" value="mean"/> + <param name="opcol" value="2"/> + <param name="opround" value="no"/> + <output name="out_file1" file="groupby_out2.dat"/> </test> - --> -  -  </tests> <help>

1 0

[hg] galaxy 1685: Script to enumerate GOPS JOIN jobs that could ...
by Nate Coraor 23 Dec '08

23 Dec '08

details: http://www.bx.psu.edu/hg/galaxy/rev/82886ba9323b changeset: 1685:82886ba9323b user: guru date: Mon Dec 22 13:35:01 2008 -0500 description: Script to enumerate GOPS JOIN jobs that could have returned an incorrect result before the issue with minimum overlap was fixed last week. 3 file(s) affected in this change: scripts/others/incorrect_gops_jobs.py scripts/others/incorrect_gops_join_jobs.py scripts/others/incorrect_gops_join_jobs.sh diffs (126 lines): diff -r c1d3004f0613 -r 82886ba9323b scripts/others/incorrect_gops_jobs.py --- a/scripts/others/incorrect_gops_jobs.py Mon Dec 22 12:15:02 2008 -0500 +++ b/scripts/others/incorrect_gops_jobs.py Mon Dec 22 13:35:01 2008 -0500 @@ -76,7 +76,10 @@ else: new_cmd_line = " ".join(map(str,cmd_line.split()[:3])) + " " + new_output.name + " " + " ".join(map(str,cmd_line.split()[4:])) job_output = cmd_line.split()[3] - os.system(new_cmd_line) + try: + os.system(new_cmd_line) + except: + pass diff_status = os.system('diff %s %s >> /dev/null' %(new_output.name, job_output)) if diff_status == 0: continue diff -r c1d3004f0613 -r 82886ba9323b scripts/others/incorrect_gops_join_jobs.py --- /dev/null Thu Jan 01 00:00:00 1970 +0000 +++ b/scripts/others/incorrect_gops_join_jobs.py Mon Dec 22 13:35:01 2008 -0500 @@ -0,0 +1,99 @@ +#!/usr/bin/env python +""" +Fetch gops_join wherein the use specified minimum coverage is not 1. +""" + +from galaxy import eggs +import sys, os, ConfigParser, tempfile +import galaxy.app +import galaxy.model.mapping +import pkg_resources + +pkg_resources.require( "SQLAlchemy >= 0.4" ) +import sqlalchemy as sa + +assert sys.version_info[:2] >= ( 2, 4 ) + +class TestApplication( object ): + """Encapsulates the state of a Universe application""" + def __init__( self, database_connection=None, file_path=None ): + print >> sys.stderr, "python path is: " + ", ".join( sys.path ) + if database_connection is None: + raise Exception( "CleanupDatasetsApplication requires a database_connection value" ) + if file_path is None: + raise Exception( "CleanupDatasetsApplication requires a file_path value" ) + self.database_connection = database_connection + self.file_path = file_path + # Setup the database engine and ORM + self.model = galaxy.model.mapping.init( self.file_path, self.database_connection, engine_options={}, create_tables=False ) + +def main(): + ini_file = sys.argv[1] + conf_parser = ConfigParser.ConfigParser( {'here':os.getcwd()} ) + conf_parser.read( ini_file ) + configuration = {} + for key, value in conf_parser.items( "app:main" ): + configuration[key] = value + database_connection = configuration['database_connection'] + file_path = configuration['file_path'] + app = TestApplication( database_connection=database_connection, file_path=file_path ) + jobs = {} + try: + for job in app.model.Job.filter( sa.and_( app.model.Job.table.c.create_time < '2008-12-16', + app.model.Job.table.c.state == 'ok', + app.model.Job.table.c.tool_id == 'gops_join_1', + sa.not_( app.model.Job.table.c.command_line.like( '%-m 1 %' ) ) + ) + ).all(): + print "# processing job id %s" % str( job.id ) + for jtoda in job.output_datasets: + print "# --> processing JobToOutputDatasetAssociation id %s" % str( jtoda.id ) + hda = app.model.HistoryDatasetAssociation.get( jtoda.dataset_id ) + print "# ----> processing HistoryDatasetAssociation id %s" % str( hda.id ) + if not hda.deleted: + # Probably don't need this check, since the job state should suffice, but... + if hda.dataset.state == 'ok': + history = app.model.History.get( hda.history_id ) + print "# ------> processing history id %s" % str( history.id ) + if history.user_id: + cmd_line = str( job.command_line ) + new_output = tempfile.NamedTemporaryFile('w') + new_cmd_line = " ".join(map(str,cmd_line.split()[:4])) + " " + new_output.name + " " + " ".join(map(str,cmd_line.split()[5:])) + job_output = cmd_line.split()[4] + try: + os.system(new_cmd_line) + except: + pass + diff_status = os.system('diff %s %s >> /dev/null' %(new_output.name, job_output)) + if diff_status == 0: + continue + print "# --------> Outputs differ" + user = app.model.User.get( history.user_id ) + jobs[ job.id ] = {} + jobs[ job.id ][ 'hda_id' ] = hda.id + jobs[ job.id ][ 'hda_name' ] = hda.name + jobs[ job.id ][ 'hda_info' ] = hda.info + jobs[ job.id ][ 'history_id' ] = history.id + jobs[ job.id ][ 'history_name' ] = history.name + jobs[ job.id ][ 'history_update_time' ] = history.update_time + jobs[ job.id ][ 'user_email' ] = user.email + except Exception, e: + print "# caught exception: %s" % str( e ) + + print "\n\n# Number of incorrect Jobs: %d\n\n" % ( len( jobs ) ) + print "#job_id\thda_id\thda_name\thda_info\thistory_id\thistory_name\thistory_update_time\tuser_email" + for jid in jobs: + print '%s\t%s\t"%s"\t"%s"\t%s\t"%s"\t"%s"\t%s' % \ + ( str( jid ), + str( jobs[ jid ][ 'hda_id' ] ), + jobs[ jid ][ 'hda_name' ], + jobs[ jid ][ 'hda_info' ], + str( jobs[ jid ][ 'history_id' ] ), + jobs[ jid ][ 'history_name' ], + jobs[ jid ][ 'history_update_time' ], + jobs[ jid ][ 'user_email' ] + ) + sys.exit(0) + +if __name__ == "__main__": + main() diff -r c1d3004f0613 -r 82886ba9323b scripts/others/incorrect_gops_join_jobs.sh --- /dev/null Thu Jan 01 00:00:00 1970 +0000 +++ b/scripts/others/incorrect_gops_join_jobs.sh Mon Dec 22 13:35:01 2008 -0500 @@ -0,0 +1,4 @@ +#!/bin/sh + +cd `dirname $0`/../.. +python ./scripts/others/incorrect_gops_join_jobs.py ./universe_wsgi.ini >> ./scripts/others/incorrect_gops_join_jobs.log

1 0

[hg] galaxy 1683: fix for short read score distribution tool. re...
by Greg Von Kuster 20 Dec '08

20 Dec '08

details: http://www.bx.psu.edu/hg/galaxy/rev/96f2c4630e62 changeset: 1683:96f2c4630e62 user: wychung date: Fri Dec 19 14:25:43 2008 -0500 description: fix for short read score distribution tool. restore test data. 3 file(s) affected in this change: test-data/454Score.png test-data/solexaScore.png tools/metag_tools/short_reads_figure_score.py diffs (35 lines): diff -r b7aabc2553fc -r 96f2c4630e62 test-data/454Score.png Binary file test-data/454Score.png has changed diff -r b7aabc2553fc -r 96f2c4630e62 test-data/solexaScore.png Binary file test-data/solexaScore.png has changed diff -r b7aabc2553fc -r 96f2c4630e62 tools/metag_tools/short_reads_figure_score.py --- a/tools/metag_tools/short_reads_figure_score.py Fri Dec 19 12:25:52 2008 -0500 +++ b/tools/metag_tools/short_reads_figure_score.py Fri Dec 19 14:25:43 2008 -0500 @@ -62,9 +62,9 @@ return score_points def __main__(): - + invalid_lines = 0 - + infile_score_name = sys.argv[1].strip() outfile_R_name = sys.argv[2].strip() @@ -153,7 +153,6 @@ number_of_points = 20 else: number_of_points = read_length - read_length_threshold = 100 # minimal read length for 454 file score_points = [] score_matrix = [] @@ -180,7 +179,6 @@ big = 0 tmp_array.append( big ) score_points.append( tmp_array ) - elif seq_method == '454': # skip the last fasta sequence score = ''

1 0

[hg] galaxy 1682: fixed for short read build distribution tool. ...
by Greg Von Kuster 19 Dec '08

19 Dec '08

details: http://www.bx.psu.edu/hg/galaxy/rev/b7aabc2553fc changeset: 1682:b7aabc2553fc user: wychung date: Fri Dec 19 12:25:52 2008 -0500 description: fixed for short read build distribution tool. remove unused arrays. also update test data output. 3 file(s) affected in this change: test-data/454Score.png test-data/solexaScore.png tools/metag_tools/short_reads_figure_score.py diffs (87 lines): diff -r d38b593a27b4 -r b7aabc2553fc test-data/454Score.png Binary file test-data/454Score.png has changed diff -r d38b593a27b4 -r b7aabc2553fc test-data/solexaScore.png Binary file test-data/solexaScore.png has changed diff -r d38b593a27b4 -r b7aabc2553fc tools/metag_tools/short_reads_figure_score.py --- a/tools/metag_tools/short_reads_figure_score.py Fri Dec 19 11:05:54 2008 -0500 +++ b/tools/metag_tools/short_reads_figure_score.py Fri Dec 19 12:25:52 2008 -0500 @@ -62,6 +62,9 @@ return score_points def __main__(): + + invalid_lines = 0 + infile_score_name = sys.argv[1].strip() outfile_R_name = sys.argv[2].strip() @@ -150,7 +153,7 @@ number_of_points = 20 else: number_of_points = read_length - quality_score = {} # quantile dictionary + read_length_threshold = 100 # minimal read length for 454 file score_points = [] score_matrix = [] @@ -177,12 +180,7 @@ big = 0 tmp_array.append( big ) score_points.append( tmp_array ) - # quartile - for j, k in enumerate( tmp_array ): - if quality_score.has_key( ( j, k ) ): - quality_score[ ( j, k ) ] += 1 - else: - quality_score[ ( j, k ) ] = 1 + elif seq_method == '454': # skip the last fasta sequence score = '' @@ -203,12 +201,6 @@ score_points_tmp = merge_to_20_datapoints( score ) score_points.append( score_points_tmp ) tmp_array = score_points_tmp - # quartile - for j, k in enumerate( tmp_array ): - if quality_score.has_key( ( j, k ) ): - quality_score[ ( j, k ) ] += 1 - else: - quality_score[ ( j ,k ) ] = 1 score = '' else: score = "%s %s" % ( score, line ) @@ -222,19 +214,16 @@ score_points_tmp = merge_to_20_datapoints( score ) score_points.append( score_points_tmp ) tmp_array = score_points_tmp - for j, k in enumerate( tmp_array ): - if quality_score.has_key( ( j, k ) ): - quality_score[ ( j, k ) ] += 1 - else: - quality_score[ ( j, k ) ] = 1 # reverse the matrix, for R - tmp_array = [] for i in range( number_of_points - 1 ): + tmp_array = [] for j in range( len( score_points ) ): - tmp_array.append( int( score_points[j][i] ) ) + try: + tmp_array.append( int( score_points[j][i] ) ) + except: + invalid_lines += 1 score_matrix.append( tmp_array ) - tmp_array = [] # generate pdf figures #outfile_R_pdf = outfile_R_name @@ -268,6 +257,8 @@ if invalid_scores > 0: print 'Skipped %d invalid scores. ' % invalid_scores + if invalid_lines > 0: + print 'Skipped %d invalid lines. ' % invalid_lines if empty_score_matrix_columns > 0: print '%d missing scores in score_matrix. ' % empty_score_matrix_columns

1 0

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

galaxy-dev December 2008