Adding the 'name' content of the selection parameter
by Assaf Gordon
Hello,
Continuing the discussion from Dec. 19th,
http://mail.bx.psu.edu/pipermail/galaxy-user/2008-December/000422.html
The following hack adds an additional key/value which contains the
'name' of the selected option in a <select> input parameter.
With this hack, the following tool configuration is possible (note the
output/label option):
==========================
<tool>
..
<inputs>
<input format="fasta" type="data" name="input">
<input type="select" name="database" label="Database">
<option value="/home/gordon/long/path/hg18.fa">Human (hg18)</option>
<option value="/home/gordon/long/path/dm3.fa">Fly (dm3)</option>
<option value="/home/gordon/long/path/mm9.fa">Mouse (mm9)</option>
</input>
</inputs>
<outputs>
<!-- old way -->
<data format="txt" name="output" label="Blat $input1 on $database" />
<!-- new way -->
<data format="txt" name="output" label="Blat $input1 on
$database_name" />
</outputs>
==========================
Currently, using "$database" in the output/label part will put the
*value* of the selected option (e.g. "/home/gordon/long/path/hg18.fa")
in the dataset's name. This doesn't really help the user understand what
the dataset contains.
With this hack, for each <select> input, a new variable is created (with
the "_name" suffix) which will contain the *name* of the selected option
(e.g. "Human (hg18)").
For example, if the user selected the second option,
Then "$database" will contain "/home/gordon/long/path/dm3.fa", and
"$database_name" will contain "Fly (dm3)".
I'm well aware this is an ugly hack. A better way would be to have each
parameter (in the incoming dictionary) be an object (e.g. ToolParameter)
and not a string. Then we could write something like "$database.name" or
"$database.value".
The hack contains two patches.
The first add support for 'value_to_display_text' with dynamic options,
in "Parameters/Basic.py":
==============================================================
--- galaxy_prod/lib/galaxy/tools/parameters/basic.py
+++ galaxy_devel/lib/galaxy/tools/parameters/basic.py
@@ -576,7 +576,7 @@
elif len( value ) == 1:
value = value[0]
return value
- def value_to_display_text( self, value, app ):
+ def value_to_display_text( self, value, app, other_values={} ):
if isinstance( value, UnvalidatedValue ):
suffix = "\n(value not yet validated)"
value = value.value
@@ -584,10 +584,11 @@
suffix = ""
if not isinstance( value, list ):
value = [ value ]
- # FIXME: Currently only translating values back to labels if they
- # are not dynamic
if self.is_dynamic:
- rval = map( str, value )
+ rval = [ ]
+ for name in self.options.get_options ( app , other_values ) :
+ if ( name[1] in value ):
+ rval.append ( name[0] )
else:
options = list( self.static_options )
rval = []
==============================================================
The second patch iterates the tool's parameters, and for each <select>
type parameter, adds the additional key/value for the selected option:
==============================================================
--- galaxy_prod/lib/galaxy/tools/actions/__init__.py
+++ galaxy_devel/lib/galaxy/tools/actions/__init__.py
@@ -96,6 +96,18 @@
on_text = ""
# Add the dbkey to the incoming parameters
incoming[ "dbkey" ] = input_dbkey
+
+ ##28dec2008, gordon
+ ## For every 'Select' parameter, we get only the 'value' part of the
selection from the HTTP request.
+ ## The following code gets the 'name' part for the selected option
+ selection_parameter_names = {}
+ for param_name, param_selected_value in incoming.iteritems():
+ param_obj = tool.get_param ( param_name )
+ if isinstance( param_obj, basic.SelectToolParameter):
+ param_selected_name = param_obj.value_to_display_text(
param_selected_value, trans, incoming )
+ selection_parameter_names [ param_name + "_name" ] =
param_selected_name.replace("\n","")
+ incoming.update( selection_parameter_names )
+
# Keep track of parent / child relationships, we'll create all
the
# datasets first, then create the associations
parent_to_child_pairs = []
==============================================================
I tried to make the changes as non intrusive as possible. I hope the
added 'other_value' parameter doesn't affect other places that call
'value_to_display_text' - but I'm not quite sure about it (I'm still
very new to python). I've tested this hack with several tools, and it
seems to work (or at least not crash horribly).
Comments are welcomed,
Gordon.
13 years, 5 months
request: add newlines to dataset's info field
by Assaf Gordon
Hello,
I'd like to request/suggest a tiny change in the dataset's info field.
Whenever a tool outputs some lines into STDOUT, they are displayed in
the dataset's info field.
Currently, if the info field contains several lines, they are displayed
as a single line (because HTML ignores new lines).
The following patch causes the info field to be displayed correctly.
=================================================================
--- galaxy_prod/lib/galaxy/datatypes/data.py
+++ galaxy_devel/lib/galaxy/datatypes/data.py
@@ -132,7 +132,7 @@
def display_info(self, dataset):
"""Returns formated html of dataset info"""
try:
- return escape(dataset.info)
+ return escape(dataset.info).replace("\n", "<br>")
except:
return "info unavailable"
def validate(self, dataset):
==================================================================
Thanks,
gordon.
13 years, 5 months
Showing dataset state in History List View
by Assaf Gordon
Hello,
I'd like to suggest a new feature:
In the history list view,
instead of just showing the 'size' of the history (= how many datasets
are in the history) - show the state of each dataset.
The reason is that some users run long jobs (or workflows with many
steps) in several histories in parallel - and they want to quickly know:
1. which jobs are running,
2. which jobs are completed,
3. Were there any errors
Currently, they have to switch to each history, and look at the state of
the datasets.
With this feature, all one needs to do is look at the history view.
Using the same color keys for ok/queued/running/error states,
users can quickly know:
1. If there's a grey box - some jobs are still queued.
2. If there are no grey boxes but some yellow boxes - some jobs are
still running.
3. if there are no grey boxes and no yellow boxes - all jobs have been
completed.
4. If there are red boxes - some jobs failed.
Attached pictures illustrate the feature (at different states of jobs).
To add this feature:
Extract the attached 'list.mako.tar.gz' to GALAXY/templates/history
(overriding the current list.mako).
You'll also need to add the following function to
GALAXY/lib/galaxy/model/__init__.py
class History, line ~399
# returns number of datasets matching the requested state
# Added by gordon, 24dec2008
def get_dataset_count( self, state ):
count = 0
for data in self.datasets:
if data.state == state and not data.deleted : count += 1
return count
Another feature in this list.mako is the separation of 'switch to' link
from the 'delete' and 'rename' links -
The 'switch to' is much more important and frequently-used than the
other two, and users have been complaining about the difficulty of
clicking it (and accidentally clicking delete or rename).
Comments are welcome,
Gordon.
13 years, 5 months
Storing/Peeking/Downloading compressed files
by Assaf Gordon
Hello,
I'd like to request/suggest a feature:
Semi-Transparent support for compressed files.
The feature requires four (tiny) patches (detailed below).
With this feature, dataset files (/database/files/NNN/dataset_NNNN.dat)
can be stored compressed, and their content will be automatically
'peeked' in the preview window.
Additionally, when a user clicks 'save' or 'eye icon', they will be
uncompressed on-the-fly - so the user doesn't need to know/care they are
compressed.
Of course, there's the whole issue of making the different tools read
and write compressed files - but that's another story.
It's actually not too complicated story:
In Python, just call gzip.open instead of open.
In shell scripts, pipe the input file through "zcat -f FILE | program".
In Perl, use PerlIO::Gzip module.
Comments are welcomed,
Regards,
Gordon.
First Patch -
Adding a function to "util" module, which returns a Gzip/Bzip2/Zip File
object (or a plain File object) based on the file type.
File type detection is done using the 'magic' module - I think it is
quite standard (in ubuntu I got it with "apt-get install python-magic").
However, to get galaxy to find this module I had to remove the "-ES"
from "run.sh" - I'm sure there's a better way to do it.
====================================================================
--- ./lib/galaxy/util/__init__.orig.py 2008-12-26 23:48:40.000000000 -0500
+++ ./lib/galaxy/util/__init__.py 2008-12-27 00:31:44.000000000 -0500
@@ -14,11 +14,41 @@ from galaxy.util.docutils_ext.htmlfrag i
pkg_resources.require( 'elementtree' )
from elementtree import ElementTree
+import magic # file detection
+import gzip # allow peeking into compressed files
+import bz2
+import zipfile
+
log = logging.getLogger(__name__)
_lock = threading.RLock()
gzip_magic = '\037\213'
+# Magic file detection
+magic_file = magic.open(magic.MAGIC_MIME)
+try:
+ magic_file.load()
+except:
+ magic_file = None
+
+def open_file_wrapper(filename):
+ file_mime = ""
+ if magic_file is not None:
+ try:
+ file_mime = magic_file.file(filename)
+ except:
+ file_mime = ""
+ if file_mime == "application/x-gzip":
+ return gzip.open(filename)
+ if file_mime == "application/x-bzip2":
+ return bz2.BZ2File(filename)
+ if file_mime == "appication/x-zip":
+ return zipfile.ZipFile(filename)
+
+ #for all other mime types, return the raw file
+ return file(filename)
+
+
def synchronized(func):
"""This wrapper will serialize access to 'func' to a single
thread. Use it as a decorator."""
def caller(*params, **kparams):
====================================================================
Second Patch -
In the 'display' action of the root web controller, return the file with
the appropriate wrapper
====================================================================
--- ./lib/galaxy/web/controllers/root_orig.py 2008-12-26
23:56:01.000000000 -0500
+++ ./lib/galaxy/web/controllers/root.py 2008-12-27 00:35:43.000000000 -0500
@@ -153,7 +153,7 @@ class RootController( BaseController ):
m1 = trans.app.memory_usage.memory( m0, pretty=True )
log.info( "End of root/display, memory used increased
by %s" % m1 )
try:
- return open( data.file_name )
+ return util.open_file_wrapper( data.file_name )
except:
return "This dataset contains no content"
else:
====================================================================
Third patch -
In the BaseController object, allow streaming on compressed files (not
just types.FileTypes):
====================================================================
--- ./lib/galaxy/web/framework/base_orig.py 2008-12-27
00:41:38.000000000 -0500
+++ ./lib/galaxy/web/framework/base.py 2008-12-27 00:41:37.000000000 -0500
@@ -25,6 +25,11 @@ from paste.response import HeaderDict
# For FieldStorage
import cgi
+# For auto-decompressing files
+import gzip
+import bz2
+import zipfile
+
log = logging.getLogger( __name__ )
class WebApplication( object ):
@@ -133,7 +138,7 @@ class WebApplication( object ):
if callable( body ):
# Assume the callable is another WSGI application to run
return body( environ, start_response )
- elif isinstance( body, types.FileType ):
+ elif isinstance( body, (types.FileType, gzip.GzipFile,
bz2.BZ2File, zipfile.ZipFile) ):
# Stream the file back to the browser
return send_file( start_response, trans, body )
else:
====================================================================
Fourth Patch -
In the generic Data datatype object, replace the file object with a
compressed file object in the peek function:
====================================================================
--- ./lib/galaxy/datatypes/data.py 2008-12-26 23:34:15.000000000 -0500
+++ ./lib/galaxy/datatypes/data_orig.py 2008-12-26 23:21:41.000000000 -0500
@@ -332,7 +332,7 @@ def get_file_peek( file_name, WIDTH=256,
count = 0
file_type = ''
data_checked = False
- for line in util.open_file_wrapper( file_name ):
+ for line in file( file_name ):
line = line[ :WIDTH ]
if not data_checked and line:
data_checked = True
====================================================================
13 years, 6 months
Downloading entire history as tar.gz archive
by Assaf Gordon
Hello,
Did you ever wish you could download all the datasets in the current
history as TAR.GZ file ? I know I did...
The attached file will allow you to do that (assuming you have your own
local galaxy server).
Installation
-------------
1. put the attached file (history_exporter.py) in the galaxy directory,
in: [GALAXY]/lib/galaxy/web/controllers
2. Install GNU TAR version 1.20
Most linux distributions don't yet have tar 1.20, but it is required for
this module to work...
You'll probably have to install it from source:
http://www.gnu.org/software/tar/
3. Change line 82 (tar_exe variable) in 'history_exporter.py'
to point to the path of the new tar.
Usage
------
Reload galaxy and switch to a history with some datasets, then browse to:
http://YOUR-GALAXY-URL/history_exporter/export
This will start a download of a tar.gz file containing all the datasets
in the current history (properly renamed, not as 'dataset_XXXX').
Additionally, a README.txt file is added to the tarball, describing each
dataset.
To see how the tar file is created, go to:
http://YOUR-GALAXY-URL/history_exporter/debug
Comments are welcome,
Happy Holidays,
Gordon.
13 years, 6 months
[hg] galaxy 1686: Since tempfiles seem to occasionally be left b...
by Nate Coraor
details: http://www.bx.psu.edu/hg/galaxy/rev/696fc4c02a0c
changeset: 1686:696fc4c02a0c
user: Nate Coraor <nate(a)bx.psu.edu>
date: Tue Dec 23 13:28:27 2008 -0500
description:
Since tempfiles seem to occasionally be left behind, allow logging of
open tempfiles, including a traceback (to determine callers) if
LOG_TEMPFILES is set in the environment when Galaxy starts. This
should be considered temporary and will be removed when it's determined
where/how this happens.
2 file(s) affected in this change:
lib/log_tempfile.py
scripts/paster.py
diffs (46 lines):
diff -r 82886ba9323b -r 696fc4c02a0c lib/log_tempfile.py
--- /dev/null Thu Jan 01 00:00:00 1970 +0000
+++ b/lib/log_tempfile.py Tue Dec 23 13:28:27 2008 -0500
@@ -0,0 +1,27 @@
+# override tempfile methods for debugging
+
+import tempfile, traceback
+
+import logging
+log = logging.getLogger( __name__ )
+
+class TempFile( object ):
+ def __init__( self ):
+ tempfile._NamedTemporaryFile = tempfile.NamedTemporaryFile
+ tempfile._mkstemp = tempfile.mkstemp
+ tempfile.NamedTemporaryFile = self.NamedTemporaryFile
+ tempfile.mkstemp = self.mkstemp
+ def NamedTemporaryFile( self, *args, **kwargs ):
+ f = tempfile._NamedTemporaryFile( *args, **kwargs )
+ try:
+ log.debug( ( "Opened tempfile %s with NamedTemporaryFile:\n" % f.name ) + "".join( traceback.format_stack() ) )
+ except AttributeError:
+ pass
+ return f
+ def mkstemp( self, *args, **kwargs ):
+ f = tempfile._mkstemp( *args, **kwargs )
+ try:
+ log.debug( ( "Opened tempfile %s with mkstemp:\n" % f[1] ) + "".join( traceback.format_stack() ) )
+ except TypeError:
+ pass
+ return f
diff -r 82886ba9323b -r 696fc4c02a0c scripts/paster.py
--- a/scripts/paster.py Mon Dec 22 13:35:01 2008 -0500
+++ b/scripts/paster.py Tue Dec 23 13:28:27 2008 -0500
@@ -16,6 +16,11 @@
from galaxy import eggs
import pkg_resources
+if 'LOG_TEMPFILES' in os.environ:
+ from log_tempfile import TempFile
+ _log_tempfile = TempFile()
+ import tempfile
+
pkg_resources.require( "PasteScript" )
from paste.script import command
13 years, 6 months
[hg] galaxy 1684: Modified sort options in grouping tool to work...
by Nate Coraor
details: http://www.bx.psu.edu/hg/galaxy/rev/c1d3004f0613
changeset: 1684:c1d3004f0613
user: guru
date: Mon Dec 22 12:15:02 2008 -0500
description:
Modified sort options in grouping tool to work correctly on older and newer versions of unix sort. Also included functional tests.
4 file(s) affected in this change:
test-data/groupby_out1.dat
test-data/groupby_out2.dat
tools/stats/grouping.py
tools/stats/grouping.xml
diffs (126 lines):
diff -r 96f2c4630e62 -r c1d3004f0613 test-data/groupby_out1.dat
--- a/test-data/groupby_out1.dat Fri Dec 19 14:25:43 2008 -0500
+++ b/test-data/groupby_out1.dat Mon Dec 22 12:15:02 2008 -0500
@@ -1,21 +1,20 @@
-chr10 55251623.000000
-chr11 87588756.250000
-chr1 148052568.250000
-chr12 38440094.000000
-chr13 112381694.000000
-chr14 98710240.000000
-chr15 41666442.500000
-chr16 206638.000000
-chr18 50562378.250000
-chr19 59226196.750000
-chr20 33504194.750000
-chr2 118341365.500000
-chr21 33160676.750000
-chr2 220209905.500000
-chr22 30471242.250000
-chr5 131612441.500000
-chr6 108564320.750000
-chr7 115958079.000000
-chr8 118881131.000000
-chr9 128842832.750000
-chrX 145194871.500000
+chr1 1.48053e+08
+chr10 5.52516e+07
+chr11 8.75888e+07
+chr12 3.84401e+07
+chr13 1.12382e+08
+chr14 9.87102e+07
+chr15 4.16664e+07
+chr16 206638
+chr18 5.05624e+07
+chr19 5.92262e+07
+chr2 1.69276e+08
+chr20 3.35042e+07
+chr21 3.31607e+07
+chr22 3.04712e+07
+chr5 1.31612e+08
+chr6 1.08564e+08
+chr7 1.15958e+08
+chr8 1.18881e+08
+chr9 1.28843e+08
+chrX 1.45195e+08
diff -r 96f2c4630e62 -r c1d3004f0613 test-data/groupby_out2.dat
--- a/test-data/groupby_out2.dat Fri Dec 19 14:25:43 2008 -0500
+++ b/test-data/groupby_out2.dat Mon Dec 22 12:15:02 2008 -0500
@@ -1,2 +1,2 @@
-chr10 1700.00 ['NM_11', 'NM_10', 'test']
-chr22 1533.33 ['NM_17', 'NM_19', 'NM_18']
\ No newline at end of file
+chr10 1700
+chr22 1533.33
\ No newline at end of file
diff -r 96f2c4630e62 -r c1d3004f0613 tools/stats/grouping.py
--- a/tools/stats/grouping.py Fri Dec 19 14:25:43 2008 -0500
+++ b/tools/stats/grouping.py Mon Dec 22 12:15:02 2008 -0500
@@ -69,8 +69,9 @@
start a key at POS1, end it at POS2 (origin 1)
In other words, column positions start at 1 rather than 0, so
we need to add 1 to group_col.
+ if POS2 is not specified, the newer versions of sort will consider the entire line for sorting. To prevent this, we set POS2=POS1.
"""
- command_line = "sort -f -k " + str(group_col+1) + " -o " + tmpfile.name + " " + inputfile
+ command_line = "sort -f -k " + str(group_col+1) +"," + str(group_col+1) + " -o " + tmpfile.name + " " + inputfile
except Exception, exc:
stop_err( 'Initialization error -> %s' %str(exc) )
diff -r 96f2c4630e62 -r c1d3004f0613 tools/stats/grouping.xml
--- a/tools/stats/grouping.xml Fri Dec 19 14:25:43 2008 -0500
+++ b/tools/stats/grouping.xml Mon Dec 22 12:15:02 2008 -0500
@@ -1,4 +1,4 @@
-<tool id="Grouping1" name="Group" version="1.3.0">
+<tool id="Grouping1" name="Group" version="1.4.0">
<description>data by a column and perform aggregate operation on other columns.</description>
<command interpreter="python">
grouping.py
@@ -38,28 +38,26 @@
<requirements>
<requirement type="python-module">rpy</requirement>
</requirements>
- <tests>
- <!-- Test valid data -->
- <!-- TODO: fix this tool so that it works on various platforms
- The following test should then work...
- <test>
- <param name="input1" value="1.bed"/>
- <param name="groupcol" value="1"/>
- <param name="optype" value="mean"/>
- <param name="opcol" value="2"/>
- <param name="opround" value="no"/>
- <output name="out_file1" file="groupby_out1.dat"/>
+ <tests>
+ <!-- Test valid data -->
+ <test>
+ <param name="input1" value="1.bed"/>
+ <param name="groupcol" value="1"/>
+ <param name="optype" value="mean"/>
+ <param name="opcol" value="2"/>
+ <param name="opround" value="no"/>
+ <output name="out_file1" file="groupby_out1.dat"/>
+ </test>
+
+ <!-- Test data with an invalid value in a column -->
+ <test>
+ <param name="input1" value="1.tabular"/>
+ <param name="groupcol" value="1"/>
+ <param name="optype" value="mean"/>
+ <param name="opcol" value="2"/>
+ <param name="opround" value="no"/>
+ <output name="out_file1" file="groupby_out2.dat"/>
</test>
- -->
- <!-- Test data with an invalid value in a column -->
- <!-- TODO: fix this test...
- <test>
- <param name="input1" value="1.tabular"/>
- <param name="groupcol" value="1"/>
- <param name="operations" value="mean 2,c 3"/>
- <output name="out_file1" file="groupby_out2.dat"/>
- </test>
- -->
</tests>
<help>
13 years, 6 months
[hg] galaxy 1685: Script to enumerate GOPS JOIN jobs that could ...
by Nate Coraor
details: http://www.bx.psu.edu/hg/galaxy/rev/82886ba9323b
changeset: 1685:82886ba9323b
user: guru
date: Mon Dec 22 13:35:01 2008 -0500
description:
Script to enumerate GOPS JOIN jobs that could have returned an incorrect result before the issue with minimum overlap was fixed last week.
3 file(s) affected in this change:
scripts/others/incorrect_gops_jobs.py
scripts/others/incorrect_gops_join_jobs.py
scripts/others/incorrect_gops_join_jobs.sh
diffs (126 lines):
diff -r c1d3004f0613 -r 82886ba9323b scripts/others/incorrect_gops_jobs.py
--- a/scripts/others/incorrect_gops_jobs.py Mon Dec 22 12:15:02 2008 -0500
+++ b/scripts/others/incorrect_gops_jobs.py Mon Dec 22 13:35:01 2008 -0500
@@ -76,7 +76,10 @@
else:
new_cmd_line = " ".join(map(str,cmd_line.split()[:3])) + " " + new_output.name + " " + " ".join(map(str,cmd_line.split()[4:]))
job_output = cmd_line.split()[3]
- os.system(new_cmd_line)
+ try:
+ os.system(new_cmd_line)
+ except:
+ pass
diff_status = os.system('diff %s %s >> /dev/null' %(new_output.name, job_output))
if diff_status == 0:
continue
diff -r c1d3004f0613 -r 82886ba9323b scripts/others/incorrect_gops_join_jobs.py
--- /dev/null Thu Jan 01 00:00:00 1970 +0000
+++ b/scripts/others/incorrect_gops_join_jobs.py Mon Dec 22 13:35:01 2008 -0500
@@ -0,0 +1,99 @@
+#!/usr/bin/env python
+"""
+Fetch gops_join wherein the use specified minimum coverage is not 1.
+"""
+
+from galaxy import eggs
+import sys, os, ConfigParser, tempfile
+import galaxy.app
+import galaxy.model.mapping
+import pkg_resources
+
+pkg_resources.require( "SQLAlchemy >= 0.4" )
+import sqlalchemy as sa
+
+assert sys.version_info[:2] >= ( 2, 4 )
+
+class TestApplication( object ):
+ """Encapsulates the state of a Universe application"""
+ def __init__( self, database_connection=None, file_path=None ):
+ print >> sys.stderr, "python path is: " + ", ".join( sys.path )
+ if database_connection is None:
+ raise Exception( "CleanupDatasetsApplication requires a database_connection value" )
+ if file_path is None:
+ raise Exception( "CleanupDatasetsApplication requires a file_path value" )
+ self.database_connection = database_connection
+ self.file_path = file_path
+ # Setup the database engine and ORM
+ self.model = galaxy.model.mapping.init( self.file_path, self.database_connection, engine_options={}, create_tables=False )
+
+def main():
+ ini_file = sys.argv[1]
+ conf_parser = ConfigParser.ConfigParser( {'here':os.getcwd()} )
+ conf_parser.read( ini_file )
+ configuration = {}
+ for key, value in conf_parser.items( "app:main" ):
+ configuration[key] = value
+ database_connection = configuration['database_connection']
+ file_path = configuration['file_path']
+ app = TestApplication( database_connection=database_connection, file_path=file_path )
+ jobs = {}
+ try:
+ for job in app.model.Job.filter( sa.and_( app.model.Job.table.c.create_time < '2008-12-16',
+ app.model.Job.table.c.state == 'ok',
+ app.model.Job.table.c.tool_id == 'gops_join_1',
+ sa.not_( app.model.Job.table.c.command_line.like( '%-m 1 %' ) )
+ )
+ ).all():
+ print "# processing job id %s" % str( job.id )
+ for jtoda in job.output_datasets:
+ print "# --> processing JobToOutputDatasetAssociation id %s" % str( jtoda.id )
+ hda = app.model.HistoryDatasetAssociation.get( jtoda.dataset_id )
+ print "# ----> processing HistoryDatasetAssociation id %s" % str( hda.id )
+ if not hda.deleted:
+ # Probably don't need this check, since the job state should suffice, but...
+ if hda.dataset.state == 'ok':
+ history = app.model.History.get( hda.history_id )
+ print "# ------> processing history id %s" % str( history.id )
+ if history.user_id:
+ cmd_line = str( job.command_line )
+ new_output = tempfile.NamedTemporaryFile('w')
+ new_cmd_line = " ".join(map(str,cmd_line.split()[:4])) + " " + new_output.name + " " + " ".join(map(str,cmd_line.split()[5:]))
+ job_output = cmd_line.split()[4]
+ try:
+ os.system(new_cmd_line)
+ except:
+ pass
+ diff_status = os.system('diff %s %s >> /dev/null' %(new_output.name, job_output))
+ if diff_status == 0:
+ continue
+ print "# --------> Outputs differ"
+ user = app.model.User.get( history.user_id )
+ jobs[ job.id ] = {}
+ jobs[ job.id ][ 'hda_id' ] = hda.id
+ jobs[ job.id ][ 'hda_name' ] = hda.name
+ jobs[ job.id ][ 'hda_info' ] = hda.info
+ jobs[ job.id ][ 'history_id' ] = history.id
+ jobs[ job.id ][ 'history_name' ] = history.name
+ jobs[ job.id ][ 'history_update_time' ] = history.update_time
+ jobs[ job.id ][ 'user_email' ] = user.email
+ except Exception, e:
+ print "# caught exception: %s" % str( e )
+
+ print "\n\n# Number of incorrect Jobs: %d\n\n" % ( len( jobs ) )
+ print "#job_id\thda_id\thda_name\thda_info\thistory_id\thistory_name\thistory_update_time\tuser_email"
+ for jid in jobs:
+ print '%s\t%s\t"%s"\t"%s"\t%s\t"%s"\t"%s"\t%s' % \
+ ( str( jid ),
+ str( jobs[ jid ][ 'hda_id' ] ),
+ jobs[ jid ][ 'hda_name' ],
+ jobs[ jid ][ 'hda_info' ],
+ str( jobs[ jid ][ 'history_id' ] ),
+ jobs[ jid ][ 'history_name' ],
+ jobs[ jid ][ 'history_update_time' ],
+ jobs[ jid ][ 'user_email' ]
+ )
+ sys.exit(0)
+
+if __name__ == "__main__":
+ main()
diff -r c1d3004f0613 -r 82886ba9323b scripts/others/incorrect_gops_join_jobs.sh
--- /dev/null Thu Jan 01 00:00:00 1970 +0000
+++ b/scripts/others/incorrect_gops_join_jobs.sh Mon Dec 22 13:35:01 2008 -0500
@@ -0,0 +1,4 @@
+#!/bin/sh
+
+cd `dirname $0`/../..
+python ./scripts/others/incorrect_gops_join_jobs.py ./universe_wsgi.ini >> ./scripts/others/incorrect_gops_join_jobs.log
13 years, 6 months
[hg] galaxy 1683: fix for short read score distribution tool. re...
by Greg Von Kuster
details: http://www.bx.psu.edu/hg/galaxy/rev/96f2c4630e62
changeset: 1683:96f2c4630e62
user: wychung
date: Fri Dec 19 14:25:43 2008 -0500
description:
fix for short read score distribution tool. restore test data.
3 file(s) affected in this change:
test-data/454Score.png
test-data/solexaScore.png
tools/metag_tools/short_reads_figure_score.py
diffs (35 lines):
diff -r b7aabc2553fc -r 96f2c4630e62 test-data/454Score.png
Binary file test-data/454Score.png has changed
diff -r b7aabc2553fc -r 96f2c4630e62 test-data/solexaScore.png
Binary file test-data/solexaScore.png has changed
diff -r b7aabc2553fc -r 96f2c4630e62 tools/metag_tools/short_reads_figure_score.py
--- a/tools/metag_tools/short_reads_figure_score.py Fri Dec 19 12:25:52 2008 -0500
+++ b/tools/metag_tools/short_reads_figure_score.py Fri Dec 19 14:25:43 2008 -0500
@@ -62,9 +62,9 @@
return score_points
def __main__():
-
+
invalid_lines = 0
-
+
infile_score_name = sys.argv[1].strip()
outfile_R_name = sys.argv[2].strip()
@@ -153,7 +153,6 @@
number_of_points = 20
else:
number_of_points = read_length
-
read_length_threshold = 100 # minimal read length for 454 file
score_points = []
score_matrix = []
@@ -180,7 +179,6 @@
big = 0
tmp_array.append( big )
score_points.append( tmp_array )
-
elif seq_method == '454':
# skip the last fasta sequence
score = ''
13 years, 6 months
[hg] galaxy 1682: fixed for short read build distribution tool. ...
by Greg Von Kuster
details: http://www.bx.psu.edu/hg/galaxy/rev/b7aabc2553fc
changeset: 1682:b7aabc2553fc
user: wychung
date: Fri Dec 19 12:25:52 2008 -0500
description:
fixed for short read build distribution tool. remove unused arrays. also update test data output.
3 file(s) affected in this change:
test-data/454Score.png
test-data/solexaScore.png
tools/metag_tools/short_reads_figure_score.py
diffs (87 lines):
diff -r d38b593a27b4 -r b7aabc2553fc test-data/454Score.png
Binary file test-data/454Score.png has changed
diff -r d38b593a27b4 -r b7aabc2553fc test-data/solexaScore.png
Binary file test-data/solexaScore.png has changed
diff -r d38b593a27b4 -r b7aabc2553fc tools/metag_tools/short_reads_figure_score.py
--- a/tools/metag_tools/short_reads_figure_score.py Fri Dec 19 11:05:54 2008 -0500
+++ b/tools/metag_tools/short_reads_figure_score.py Fri Dec 19 12:25:52 2008 -0500
@@ -62,6 +62,9 @@
return score_points
def __main__():
+
+ invalid_lines = 0
+
infile_score_name = sys.argv[1].strip()
outfile_R_name = sys.argv[2].strip()
@@ -150,7 +153,7 @@
number_of_points = 20
else:
number_of_points = read_length
- quality_score = {} # quantile dictionary
+
read_length_threshold = 100 # minimal read length for 454 file
score_points = []
score_matrix = []
@@ -177,12 +180,7 @@
big = 0
tmp_array.append( big )
score_points.append( tmp_array )
- # quartile
- for j, k in enumerate( tmp_array ):
- if quality_score.has_key( ( j, k ) ):
- quality_score[ ( j, k ) ] += 1
- else:
- quality_score[ ( j, k ) ] = 1
+
elif seq_method == '454':
# skip the last fasta sequence
score = ''
@@ -203,12 +201,6 @@
score_points_tmp = merge_to_20_datapoints( score )
score_points.append( score_points_tmp )
tmp_array = score_points_tmp
- # quartile
- for j, k in enumerate( tmp_array ):
- if quality_score.has_key( ( j, k ) ):
- quality_score[ ( j, k ) ] += 1
- else:
- quality_score[ ( j ,k ) ] = 1
score = ''
else:
score = "%s %s" % ( score, line )
@@ -222,19 +214,16 @@
score_points_tmp = merge_to_20_datapoints( score )
score_points.append( score_points_tmp )
tmp_array = score_points_tmp
- for j, k in enumerate( tmp_array ):
- if quality_score.has_key( ( j, k ) ):
- quality_score[ ( j, k ) ] += 1
- else:
- quality_score[ ( j, k ) ] = 1
# reverse the matrix, for R
- tmp_array = []
for i in range( number_of_points - 1 ):
+ tmp_array = []
for j in range( len( score_points ) ):
- tmp_array.append( int( score_points[j][i] ) )
+ try:
+ tmp_array.append( int( score_points[j][i] ) )
+ except:
+ invalid_lines += 1
score_matrix.append( tmp_array )
- tmp_array = []
# generate pdf figures
#outfile_R_pdf = outfile_R_name
@@ -268,6 +257,8 @@
if invalid_scores > 0:
print 'Skipped %d invalid scores. ' % invalid_scores
+ if invalid_lines > 0:
+ print 'Skipped %d invalid lines. ' % invalid_lines
if empty_score_matrix_columns > 0:
print '%d missing scores in score_matrix. ' % empty_score_matrix_columns
13 years, 6 months