April 2011 - galaxy-dev - lists.galaxyproject.org

how to purge histories/datasets not accessed/updated for a certain time
by Chaolin Zhang 13 Jul '11

13 Jul '11

Hi, We have a local mirror of the galaxy system and the disk is occupied really quickly. Is there a way to purge histories/datasets not accessed/updated for a certain period of time, no matter if they are deleted by the user? It looks like the current scripts for clean up only purges deleted histories/datasets. Thanks! Chaolin

2 1

plotting/statistics tools suggestion
by Marshall Hampton 12 Jul '11

12 Jul '11

I've just started seriously looking at Galaxy, and I already have a suggestion (everyone's a critic...): switch from using rpy or even rpy2 to scipy/numpy/matplotlib for basic statistics and plots. I wrestled a bit with getting rpy to work on my local setup and decided it would be quicker to write my own plotting tool extensions, which indeed turned out to be the case. Since you already require python it seems much more natural to use python-native tools. To give some positive feedback, learning to write an extension was surprisingly easy and encourages me to work on more. I use currently use Sage (http://www.sagemath.org/) to both analyze next-generation sequence data (454 and Illumina) and create interactive tools for the biologists I collaborate with. The Sage project involves many of the same issues and challenges facing Galaxy. Sage is based on python, but includes R. I realize that there are many things you would want to do with R that aren't included in scipy/biopython, so it might be worthwhile to look at how Sage wraps R. Its far from perfect, but I prefer it to rpy2. In the Sage source tree the interface is at: $SAGE_ROOT/devel/sage/sage/interfaces/r.py. (Ugly online copy at: http://hg.sagemath.org/sage-main/file/361a4ad7d52c/sage/interfaces/r.py) -Marshall Hampton Department of Mathematics and Statistics and the Integrated Biosciences Program University of Minnesota Duluth

2 1

suggestions for the SAM-to-BAM tool
by Assaf Gordon 06 Jul '11

06 Jul '11

Hi, Couple of things that can be slightly improved in the SAM-to-BAM tool: 1. "Reference list" is not informative (it's the technical way to say: "list of chromosomes and their sizes based on a FASTA file"). Users do not generally know what "reference list" is. 2. The "Locally Cached" option is not informative (I had to look in the source code to understand what it means). What it should say is something like: "Get list of chromosomes/sizes based on the dataset's organism/database" (could be shorter, but should be friendly enough). 3. There's no option of having the chromosome list in the SAM file header. Some SAM files will contain the header (can even be done in the standard bowtie tool wrapper) - saves the need to specify where to get the "reference list" from. 4. Autodetection in the "set-metadata" step will go a long way here: if the SAM file already have a header, then no need to even ask about it. If it doesn't have a header but have a DBKEY, then we're still OK. If no DBKEY and no header, then complain or ask for a FASTA file from current history. (I realize the implementing this feature is hard and annoying, I don't imply that it's easy to do, just that it's needed). 5. Inside the python script (sam_to_bam.py) there's a comment that says: "for some reason the samtools view command gzips the resulting bam file without warning" . Not sure why one cares about that, but "samtools view -u" will output an uncompressed BAM file. 6. samtools support piping, so a lot of I/O (and some time) can be spared by piping the two commands together: samtools view -u -b -S "INPUT.SAM" | samtools sort - OUTPUT Instead of running two commands and generating a temporary unsorted BAM file. -gordon

2 1

importing annotation with dataset
by Davide Cittaro 01 Jul '11

01 Jul '11

Hi, is it just me or when I try to import a dataset from a library to a history, annotation is not retained (whereas "info" is)? d /* Davide Cittaro, PhD Cogentech - Consortium for Genomic Technologies via adamello, 16 20139 Milano Italy tel.: +39(02)574303007 e-mail: davide.cittaro(a)ifom-ieo-campus.it */

2 1

importing into history from library
by Glen Beane 01 Jul '11

01 Jul '11

We're running the latest galaxy-dist and noticed a problem importing files from libraries into our history. If we click the down arrow to the right of a dataset and choose "Import this dataset into selected histories" it works fine, however if our datasets are in folders within the library and we select multiple datasets and then do "Import selected datasets into histories" nothing happens. If I go to a library with datasets that are not in a folder then selecting multiple works. Has anyone else seen this? -- Glen L. Beane Senior Software Engineer The Jackson Laboratory (207) 288-6153

2 1

bwa failure preparing job
by Branden Timm 29 Jun '11

29 Jun '11

Hi All, I'm having issues running BWA for Illumina with the latest version of Galaxy (5433:c1aeb2f33b4a). It seems that the error is a python list error while preparing the job: Traceback (most recent call last): File "/home/galaxy/galaxy-central/lib/galaxy/jobs/runners/local.py", line 58, in run_job job_wrapper.prepare() File "/home/galaxy/galaxy-central/lib/galaxy/jobs/__init__.py", line 371, in prepare self.command_line = self.tool.build_command_line( param_dict ) File "/home/galaxy/galaxy-central/lib/galaxy/tools/__init__.py", line 1575, in build_command_line command_line = fill_template( self.command, context=param_dict ) File "/home/galaxy/galaxy-central/lib/galaxy/util/template.py", line 9, in fill_template return str( Template( source=template_text, searchList=[context] ) ) File "/home/galaxy/galaxy-central/eggs/Cheetah-2.2.2-py2.6-linux-x86_64-ucs4.egg/Cheetah/Template.py", line 1004, in __str__ return getattr(self, mainMethName)() File "DynamicallyCompiledCheetahTemplate.py", line 106, in respond IndexError: list index out of range I checked the bwa_index.loc file for errors, it seems that the line for the reference genome I'm trying to map against is correct (all whitespace is tab characters): synpcc7002 synpcc7002 Synechococcus /home/galaxy/galaxy-central/bwa_ indices/SYNPCC7002 I'm not sure what the next troubleshooting step is, any ideas? -- Branden Timm btimm(a)glbrc.wisc.edu

3 5

Error when adding datasets
by Louise-Amélie Schmitt 03 Jun '11

03 Jun '11

Hello everyone I have an issue when trying to import new datasets or when putting a dataset into a history. I saw Edward Kirton had the same problem but he got no answer: http://lists.bx.psu.edu/pipermail/galaxy-dev/2010-May/002732.html Here is the error message I get when clicking the "Add datasets" button in a library, in the admin's "Manage data libraries" panel: UnmappedInstanceError: Class '__builtin__.NoneType' is not mapped URL: http://manni/galaxy/library_common/upload_library_dataset?library_id=f2db41… File '/g/funcgen/galaxy/eggs/WebError-0.8a-py2.6.egg/weberror/evalexception/middleware.py', line 364 in respond app_iter = self.application(environ, detect_start_response) File '/g/funcgen/galaxy/eggs/Paste-1.6-py2.6.egg/paste/debug/prints.py', line 98 in __call__ environ, self.app) File '/g/funcgen/galaxy/eggs/Paste-1.6-py2.6.egg/paste/wsgilib.py', line 539 in intercept_output app_iter = application(environ, replacement_start_response) File '/g/funcgen/galaxy/eggs/Paste-1.6-py2.6.egg/paste/recursive.py', line 80 in __call__ return self.application(environ, start_response) File '/g/funcgen/galaxy/lib/galaxy/web/framework/middleware/remoteuser.py', line 109 in __call__ return self.app( environ, start_response ) File '/g/funcgen/galaxy/eggs/Paste-1.6-py2.6.egg/paste/httpexceptions.py', line 632 in __call__ return self.application(environ, start_response) File '/g/funcgen/galaxy/lib/galaxy/web/framework/base.py', line 145 in __call__ body = method( trans, **kwargs ) File '/g/funcgen/galaxy/lib/galaxy/web/controllers/library_common.py', line 907 in upload_library_dataset trans.sa_session.refresh( history ) File '/g/funcgen/galaxy/eggs/SQLAlchemy-0.5.6_dev_r6498-py2.6.egg/sqlalchemy/orm/scoping.py', line 127 in do return getattr(self.registry(), name)(*args, **kwargs) File '/g/funcgen/galaxy/eggs/SQLAlchemy-0.5.6_dev_r6498-py2.6.egg/sqlalchemy/orm/session.py', line 925 in refresh raise exc.UnmappedInstanceError(instance) UnmappedInstanceError: Class '__builtin__.NoneType' is not mapped Now when does it occur: I have two databases. One test database I created a month ago which works fine, even now. The other one I created recently which is supposed to be the final database. But the latter keeps triggering the above message, even when I drop it and create it all over again. I even tried to create a third one, all clean and new but the problem remains. I tried to trash all the eggs so Galaxy gets fresh new eggs, with no effect at all. The error's still there. If you have any clue, I'll be forever grateful. Cheers, L-A

5 20

Migration error: fields in MySQL
by John Eppley 02 Jun '11

02 Jun '11

I had an error upgrading my galaxy instance. I got the following exception while migrating the db (during step 64->65): sqlalchemy.exc.ProgrammingError: (ProgrammingError) (1064, "You have an error in your SQL syntax; check the manual that corresponds to your MySQL server version for the right syntax to use near 'fields FROM form_definition' at line 1") u'SELECT id, fields FROM form_definition' [] It seems my version (4.1.22-log) of MySQL did not like 'fields' as a column name. If I alias the formdefinition as f and us f.fields, the error goes away. I also had to modify migration 76 for the same reason. Here is my diff of the migrations dir: diff -r 50e249442c5a lib/galaxy/model/migrate/versions/0065_add_name_to_form_fields_and_values.py --- a/lib/galaxy/model/migrate/versions/0065_add_name_to_form_fields_and_values.py Thu Apr 07 08:39:07 2011 -0400 +++ b/lib/galaxy/model/migrate/versions/0065_add_name_to_form_fields_and_values.py Fri Apr 15 11:09:26 2011 -0400 @@ -39,7 +39,7 @@ return '' # Go through the entire table and add a 'name' attribute for each field # in the list of fields for each form definition - cmd = "SELECT id, fields FROM form_definition" + cmd = "SELECT f.id, f.fields FROM form_definition f" result = db_session.execute( cmd ) for row in result: form_definition_id = row[0] @@ -53,7 +53,7 @@ field[ 'helptext' ] = field[ 'helptext' ].replace("'", "''").replace('"', "") field[ 'label' ] = field[ 'label' ].replace("'", "''") fields_json = to_json_string( fields_list ) - cmd = "UPDATE form_definition SET fields='%s' WHERE id=%i" %( fields_json, form_definition_id ) + cmd = "UPDATE form_definition f SET f.fields='%s' WHERE f.id=%i" %( fields_json, form_definition_id ) db_session.execute( cmd ) # replace the values list in the content field of the form_values table with a name:value dict cmd = "SELECT form_values.id, form_values.content, form_definition.fields" \ @@ -112,7 +112,7 @@ cmd = "UPDATE form_values SET content='%s' WHERE id=%i" %( to_json_string( values_list ), form_values_id ) db_session.execute( cmd ) # remove name attribute from the field column of the form_definition table - cmd = "SELECT id, fields FROM form_definition" + cmd = "SELECT f.id, f.fields FROM form_definition f" result = db_session.execute( cmd ) for row in result: form_definition_id = row[0] @@ -124,5 +124,5 @@ for index, field in enumerate( fields_list ): if field.has_key( 'name' ): del field[ 'name' ] - cmd = "UPDATE form_definition SET fields='%s' WHERE id=%i" %( to_json_string( fields_list ), form_definition_id ) + cmd = "UPDATE form_definition f SET f.fields='%s' WHERE id=%i" %( to_json_string( fields_list ), form_definition_id ) db_session.execute( cmd ) diff -r 50e249442c5a lib/galaxy/model/migrate/versions/0076_fix_form_values_data_corruption.py --- a/lib/galaxy/model/migrate/versions/0076_fix_form_values_data_corruption.py Thu Apr 07 08:39:07 2011 -0400 +++ b/lib/galaxy/model/migrate/versions/0076_fix_form_values_data_corruption.py Fri Apr 15 11:09:26 2011 -0400 @@ -32,7 +32,7 @@ def upgrade(): print __doc__ metadata.reflect() - cmd = "SELECT form_values.id as id, form_values.content as field_values, form_definition.fields as fields " \ + cmd = "SELECT form_values.id as id, form_values.content as field_values, form_definition.fields as fdfields " \ + " FROM form_definition, form_values " \ + " WHERE form_values.form_definition_id=form_definition.id " \ + " ORDER BY form_values.id" @@ -46,7 +46,7 @@ except Exception, e: corrupted_rows = corrupted_rows + 1 # content field is corrupted - fields_list = from_json_string( _sniffnfix_pg9_hex( str( row['fields'] ) ) ) + fields_list = from_json_string( _sniffnfix_pg9_hex( str( row['fdfields'] ) ) ) field_values_str = _sniffnfix_pg9_hex( str( row['field_values'] ) ) try: #Encoding errors? Just to be safe. -j

2 1

Support for subdirs in dataset extra_files_path
by Jim Johnson 02 Jun '11

02 Jun '11

Request is issue#494 https://bitbucket.org/galaxy/galaxy-central/issue/494/support-sub-dirs-in-e… I'm finding that some qiime metagenomics applications build HTML results with an inherent directory structure. For some other applications, e.g. FastQC, I've been able to flatten the hierarchy and edit the html, but that appears problematic for qiime. Galaxy hasn't supported a dataset extra_files_path hierarchy, though the developers don't seem opposed to the idea: http://lists.bx.psu.edu/pipermail/galaxy-dev/2010-October/003605.html I added a route in lib/galaxy/web/buildapp.py and modified the dataset download code to traverse a hierarchy in lib/galaxy/web/controllers/dataset.py I don't think these add any security vulnerabilities, (I tried the obvious ../../ ). $ hg diff lib/galaxy/web/buildapp.py diff -r 6ae06d89fec7 lib/galaxy/web/buildapp.py --- a/lib/galaxy/web/buildapp.py Wed Mar 16 09:01:57 2011 -0400 +++ b/lib/galaxy/web/buildapp.py Wed Mar 16 10:24:13 2011 -0500 @@ -94,6 +94,8 @@ webapp.add_route( '/async/:tool_id/:data_id/:data_secret', controller='async', action='index', tool_id=None, data_id=None, data_secret=None ) webapp.add_route( '/:controller/:action', action='index' ) webapp.add_route( '/:action', controller='root', action='index' ) + # allow for subdirectories in extra_files_path + webapp.add_route( '/datasets/:dataset_id/display/{filename:.+?}', controller='dataset', action='display', dataset_id=None, filename=None) webapp.add_route( '/datasets/:dataset_id/:action/:filename', controller='dataset', action='index', dataset_id=None, filename=None) webapp.add_route( '/display_application/:dataset_id/:app_name/:link_name/:user_id/:app_action/:action_param', controller='dataset', action='display_application', dataset_id=None, user_id=None, app_name = None, link_name = None, app_action = None, action_param = None ) webapp.add_route( '/u/:username/d/:slug', controller='dataset', action='display_by_username_and_slug' ) $ $ hg diff lib/galaxy/web/controllers/dataset.py diff -r 6ae06d89fec7 lib/galaxy/web/controllers/dataset.py --- a/lib/galaxy/web/controllers/dataset.py Wed Mar 16 09:01:57 2011 -0400 +++ b/lib/galaxy/web/controllers/dataset.py Wed Mar 16 10:24:29 2011 -0500 @@ -266,17 +266,18 @@ log.exception( "Unable to add composite parent %s to temporary library download archive" % data.file_name) msg = "Unable to create archive for download, please report this error" messagetype = 'error' - flist = glob.glob(os.path.join(efp,'*.*')) # glob returns full paths - for fpath in flist: - efp,fname = os.path.split(fpath) - try: - archive.add( fpath,fname ) - except IOError: - error = True - log.exception( "Unable to add %s to temporary library download archive" % fname) - msg = "Unable to create archive for download, please report this error" - messagetype = 'error' - continue + for root, dirs, files in os.walk(efp): + for fname in files: + fpath = os.path.join(root,fname) + rpath = os.path.relpath(fpath,efp) + try: + archive.add( fpath,rpath ) + except IOError: + error = True + log.exception( "Unable to add %s to temporary library download archive" % rpath) + msg = "Unable to create archive for download, please report this error" + messagetype = 'error' + continue if not error: if params.do_action == 'zip': archive.close()

2 1

Sharing tool definitions between XML files
by Peter 02 Jun '11

02 Jun '11

Hi all, I'm wondering if there is any established way to share parameter definitions between tool wrapper XML files? For example, I am currently working on NCBI BLAST+ wrappers, and these tools have a lot in common. For example, the output format option will be a select parameter, and it will be the same for blastn, blastp, tblastn, etc. I can just cut and paste the definition but this seems inelegant and will be a long term maintenance burden if it ever needs updating (lots of places would need to be updated the same way). I'd like to have a shared XML snippet defining this parameter which I could then reference/include from each tool wrapper than needs it. I'm thinking of something like XML's <!ENTITY ...> might work. Is this possible? Peter

2 2