January 2012 - galaxy-dev - lists.galaxyproject.org

DRMAA error with latest update 26920e20157f
by Shantanu Pavgi 30 Jan '12

30 Jan '12

I am getting following error with the latest galaxy-dist revision '26920e20157f' update. The Python version is 2.6.6. {{{ galaxy.jobs.runners.drmaa ERROR 2012-01-29 21:00:28,577 Uncaught exception queueing job Traceback (most recent call last): File "/projects/galaxy/galaxy-165/lib/galaxy/jobs/runners/drmaa.py", line 140, in run_next self.queue_job( obj ) File "/projects/galaxy/galaxy-165/lib/galaxy/jobs/runners/drmaa.py", line 190, in queue_job command_line ) TypeError: not all arguments converted during string formatting }}} I was wondering if anyone else is experiencing this same issue. The system works fine when I rollback to revision 'b258de1e6cea'. Are there any additional configuration details required with the latest revision that I am missing?? -- Shantanu

1 1

Disk size of all users?
by Bossers, Alex 30 Jan '12

30 Jan '12

Reading this on the wiki: http://wiki.g2.bx.psu.edu/Admin/Disk%20Quotas Shows that there is a record in the DB tracking the users allocated diskspace for histories..... Is there a convenient way to get this info using the galaxy admin panels? Thereby we can track heavy users and urge them to cleanup or to improve data practice... Thanks Alex

1 1

How to allow anonymous users to run workflows?
by Tim te Beek 29 Jan '12

29 Jan '12

Hi all, Was wondering how I can allow anonymous users to run workflows in my local Galaxy instance, as currently users need to be logged in to run workflows. I'd like drop this requirement in light of the intended publication of a workflow in a journal which demands that "Web services must not require mandatory registration by the user.". Could any you tell me how I can accomplish this? I've seen the option to use an external authentication method which could be employed to artificially 'login' anonymous users for a single session, but it appears this would also disable the normal users administration mechanisms in Galaxy, so I'm not sure this would be a good fit. Any hints on how to proceed, either via this route or otherwise, would be much appreciated. Best regards, Tim

2 3

January 27, 2012 Galaxy Distribution & News Brief
by Jennifer Jackson 28 Jan '12

28 Jan '12

January 27, 2012 Galaxy Distribution & News Brief Complete News Brief * http://wiki.g2.bx.psu.edu/DevNewsBriefs/2012_01_27 Highlights: * Important metadata and Python 2.5 support corrections * SAMtools upgraded for version 0.1.18. Mpileup added. * Dynamic filtering, easy color options, and quicker indexing enhance Trackster * Set up your Galaxy instance to run cluster jobs as the real user, not the Galaxy owner * Improvements to metadata handling and searching in the Tool Shed * Improved solutions for schema access, jobs management, & workflow imports and inputs. * New datatypes (Eland, XML), multiple tool enhancements, and bug fixes. Get Galaxy! * http://getgalaxy.org new: % hg clone http://www.bx.psu.edu/hg/galaxy galaxy-dist upgrade: % hg pull -u -r 26920e20157f Read the release announcement and see the prior release history * http://wiki.g2.bx.psu.edu/DevNewsBriefs/ Need help with a local instance? Search with our custom google tools! * http://wiki.g2.bx.psu.edu/Mailing%20Lists#Searching And consider subscribing to the galaxy-dev mailing list! * http://wiki.g2.bx.psu.edu/Mailing%20Lists#Subscribing_and_Unsubscribing -- Jennifer Jackson Galaxy Team http://usegalaxy.org http://galaxyproject.org http://galaxyproject.org/wiki/Support

1 0

input param with type="data" multiple="true" working?
by Leandro Hermida 26 Jan '12

26 Jan '12

Hello, There have been previous requests/questions (some mine) about fixing Galaxy tool functionality to enable a multiple select menu item for input data in the history with the following: <param ... type="data" multiple="true" ... /> Instead of using the cumbersome <repeat> tags and resulting form. Is this working in the latest Galaxy build? kind regards, Leandro

1 0

how to use projects for fair-share on compute-cluster
by Edward Kirton 25 Jan '12

25 Jan '12

Galaxy sites usually do all work a compute cluster, with all jobs submitted as a "galaxy" unix user, so there isn't any "fair-share" accounting between users. Other sysops have created a solution to run jobs as the actual unix user, which may be feasible for an intranet site but is undesirable for a site accessible via the internet due to security reasons. A simpler and more secure method to enable fair-share is by using projects. Here's a simple scenario and straightforward solution: Multiple groups in an organization use the same galaxy site and it is desirable to enable fair-share accounting between the groups. All users in a group consume the same fair-share, which is generally acceptable. 1) configure scheduler with a project for each group, configure each user to use their group's project by default, and grant galaxy user access to submit jobs to any project; all users should be associated with a project. There's a good chance your grid is already configured this way. 2) create a database which maps galaxy user id to a project; i use a cron job to create a standalone sqlite3 db. since this is site-specific, code is not provided but hints are given below. Rather than having a separate database, the proj could have been added to the galaxy db, but i sought to minimize my changes. 3) add a snippet of code to drmaa.py's queue_job method to lookup proj from job_wrapper.user_id and append to jt.nativeSpecification; see below Here are the changes required. It's small enough that I didn't do this as a clone/patch. (1) lib/galaxy/jobs/runners/drmaa.py: 11 import sqlite3 12 ... 155 native_spec = self.get_native_spec( runner_url ) 156 157 # BEGIN ADD USER'S PROJ 158 if self.app.config.user_proj_map_db is not None: 159 try: 160 conn = sqlite3.connect(self.app.config.user_proj_map_db) 161 c = conn.cursor() 162 c.execute('SELECT PROJ FROM USER_PROJ WHERE GID=?', [job_wrapper.user_id]) 163 row = c.fetchone() 164 c.close 165 native_spec += ' -P ' + row[0] 166 except: 167 log.debug("Cannot look up proj of user %s" % job_wrapper.user_id) 168 # END ADD USER'S PROJ (2) lib/galaxy/config.py: add support for user_proj_map_db variable self.user_proj_map_db = resolve_path( kwargs.get( "user_proj_map_db", None ), self.root ) (3) universe_wsgi.ini: user_proj_map_db = /some/path/to/user_proj_map_db.sqlite (4) here's some suggestions to help get you started on a script to make the sqlite3 db. a) parse ldap tree example: (to get uid:email) ldapsearch -LLL -x -b 'ou=aliases,dc=jgi,dc=gov' b) parse scheduler config: (to get uid:proj) qconf -suserl | /usr/bin/xargs -I '{}' qconf -suser '{}' | egrep 'name|default_project' c) query galaxy db: (to get gid:email) select id, email from galaxy_user; The limitation of this method is that all jobs submitted by a user will always be charged to the same project (which may be okay, depending on how your organization uses projects). However a user may have access to several projects and may wish to associate some jobs with a particular project. This could be accomplished by adding an option to the user preferences; a user would chose a project from their available projects and any jobs submitted would have to record their currently chosen project. Alternatively, histories could be associated with a particular project. This solution would require significant changes to galaxy, so i haven't implemented it (and the simple solution works well enough for me). Edward Kirton US DOE JGI

3 6

MergeSamFiles.jar and TMPDIR
by Glen Beane 25 Jan '12

25 Jan '12

We recently updated to the latest galaxy-dist, and learned that the sam_merge.xml tool now uses picard MergeSamFiles.jar to merge the files instead of the samtools merge wrapper sam_merge.py. this is a problem for us because MergeSamFiles.jar does not honor $TMPDIR when creating temporary file names (the jvm developers inexplicably hard code the value of java.io.tmpdir to /tmp in Unix/Linux rather than doing the Right Thing) . On our cluster, TMPDIR is set to something like /scratch/batch_job_id/. This location has plenty of free space, however /tmp does not and now we can't successfully merge largeish bam files. In case anyone else is bit by this, I think there are two options the Picard tools take an optional TMP_DIR= argument that lets us specify the location we want to use for a temporary directory. Initially we ended up modifying the .xml to add TMP_DIR=\$TMPDIR to the arguments to MergeSamFiles.jar. This works, but we could potentially need to do this with multiple Picard tools and not just MergeSamFiles. Now I am probably going to go with the following solution: add something like "export _JAVA_OPTIONS=-Djava.io.tmpdir=$TMPDIR" to the .bashrc file for my Galaxy user. -- Glen L. Beane Senior Software Engineer The Jackson Laboratory (207) 288-6153

3 6

problem with "Input dataset" workflow control feature and custom non-subclass datatypes
by Leandro Hermida 25 Jan '12

25 Jan '12

Hi, There seems to be a weird bug with the "Input dataset" workflow control feature, hard to explain clearly but I'll try my best. If you define a custom datatype that is a simple subclass of an existing galaxy datatype, e.g.: <datatype extension="myext" type="galaxy.datatypes.data:Text" subclass="True" display_in_upload="true"/> And if this datatype will be the input to a workflow where you want to use the multiple input files feature you must put into your workflow editor an "Input dataset" box at the beginning and connect it. If you define a custom datatype that's it's own custom class, e.g.: <datatype extension="myext" type="galaxy.datatypes.data:MyExt" display_in_upload="true"/> with a simple class in lib/galaxy/datatypes/data.py e.g.: class MyExt( Data ): file_ext = "myext" And if this datatype will be the input data to a workflow if you have an "Input dataset" box at the beginning for some reason the drop-down menu (or multi-select) won't not have files of this type from your history it just ignores them. Now what is strange is if I edit the workflow and remove the beginning "Input dataset" box and start the workflow with just the first tool which has this custom datatype as an input parameter then when I try to run the workflow everything shows up properly :-/ Hope I explained this ok, seems like something is broken with the "Input dataset" workflow control feature. best, Leandro

2 4

Error Msg: Cluster could not complete job
by Dave Lin 25 Jan '12

25 Jan '12

Dear Galaxy Support, I'm getting the following error message when trying to process larger Solid files. ERROR MESSAGE: "Cluster could not complete job" - Compute Quality Statistic-- First got the error message. Ran ok after re-running the job. - Subsequent job of converting qual/csfasta -> fastq failed with same error message - Doesn't seem to happen on small solid files Potentially relevant information: 1. Cloud Instance on Amazon/Large instance 2. Only one master node on cluster. 3. Has been updated using the update feature to a version as of late last week. 4. Only 1 user right now on system, so there shouldn't be any competing load. 5. Downloaded a bunch of data files, so volume was at 94%. Currently in process of expanding volume. Question: Is this expected behavior or have I misconfigured something (i.e. some timeout value)? Any suggestions? Thanks in advance, Dave P.S. I'm new to galaxy and impressed so far. Keep up the great work.

2 1

software installs: PATH vs env.sh
by Andrew Warren 25 Jan '12

25 Jan '12

Hello, We recently transitioned from a CloudMan instance of galaxy to our own cluster and started having problems with calls to tools from within other tools. For example when Tophat calls bowtie-inspect its not finding the executable. To fix this I listed bowtie in the requirements section of the tophat wrapper like so: <tool id="tophat" name="Tophat for Illumina" version="1.5.0"> <description>Find splice junctions using RNA-seq data</description> <version_command>tophat --version</version_command> <requirements> <requirement type="package">tophat</requirement> <requirement type='package'>bowtie</requirement> <requirement type="package">samtools</requirement> </requirements> Now I am wondering, is it generally expected that all tools used by galaxy will have their executables on the user galaxy's PATH? Is the above a good solution? Or is there something else likely amiss with our galaxy setup? I think we recently pulled updates for some major tool_shed release but I haven't been able to determine if any of the tools listed above were affected by that. Wish I were in Český Krumlov asking this question. Missed the registration deadline...doh. Thanks, Andrew Warren

4 5