April 2011 - galaxy-dev - lists.galaxyproject.org

user and dataset management with LDAP: some questions
by Louise-Amélie Schmitt 01 Jun '11

01 Jun '11

Hello everyone I have a couple of questions regarding user and dataset management. 1) We use LDAP for user registration and logging, would it be possible to retrieve automatically the LDAP users' groups and create the groups in Galaxy accordigly? (and of course put the user in their respective groups) 2) Is it possible, still using LDAP, to delete a user and all his datasets? 3) Is it possible to automatically suppress, for instance, any dataset that was added more than a couple of months ago? 4) Is there a not-so-intrusive way to add a column in the user list showing the disk space they use respectively? 5) I tried to see how the API works but i have to admit I didn't get a thing. I read the scripts/api/README file and there I saw one needs to access the user preferences to generate an API key. What is its purpose? Is there a way to do it when you use LDAP (therefore no access to that key generator) ? Sorry, this is a bit random but I'm kinda drowning here, since I'm not used to manipulating apps this huge. Thanks for your help and patience. Cheers, L-A

2 2

downloading bam and bai files from sam-to-bam wrapper
by Ryan Golhar 01 Jun '11

01 Jun '11

In my local instance of Galaxy, I'm trying to download the BAM and BAI file from the tool SAM-to-BAM. I see the links for downloading the Dataset and bam_index. BTW - I'm running Safari on Mac OS X. When I download the Dataset, I get the BAM file. When I download the bam_index, it gets renamed to Galaxy13-[SAM-to-BAM_on...].bam_index.html The error is the extension, .bam_index.html. This should be .bam.bai. The BAM file gets saved correctly with .bam. I check the sam-to-bam XML wrapper but can't determine where this filename comes from or how to fix it. Has anyone else run into this? -- CONFIDENTIALITY NOTICE: This email communication may contain private, confidential, or legally privileged information intended for the sole use of the designated and/or duly authorized recipient(s). If you are not the intended recipient or have received this email in error, please notify the sender immediately by email and permanently delete all copies of this email including all attachments without reading them. If you are the intended recipient, secure the contents in a manner that conforms to all applicable state and/or federal requirements related to privacy and confidentiality of such information.

2 1

Multiple galaxy instances
by Jean-Baptiste Denis 01 Jun '11

01 Jun '11

Hello everybody, i'm in the process to provide Galaxy for multiple team. I've already setup a testing instance using the production setup page on the wiki (apache + sge) and it works quite well if i'm refearing to the users feedback. This setup is typically use for NGS dealing with data on a NFS share without uploading them to the instance. Why do i need multiple instance ? Maybe i'm not using galaxy correctly. Correct me if i'm wrong. My goal is to delagate the management of library/datasets to a galaxy admin of each team from the beginning : i do NOT want a SINGLE independant super admin to manage the access for multiple team, it doesn't scale. The galaxy instance and the underlying "galaxy" system user must access to the NGS data on the NFS (v3) share. This means that the galaxy user must be in a group that has access to the data. I can delegate the process of managing datasets and library to a dedicated galaxy admin. This setup is working quite well with the single instance setup. My job as a sysadmin is reduced to galaxy setup and maintenance : i'm not involved in the library/dataset management. The problem with this setup does not work if there is another team with data they don't want to share with others (don't blame me on that) : the galaxy system user must access the data of the first team AND the second team, this means that the galaxy admin of each team could access everything. One solution to this problem would be to have an independant galaxy super admin with access to everything which manage data access to each team. I don't like this solution, like i said, it doesn't scale. So, another way to deal with that is to give each team its own galaxy instance (each running with a specific system galaxy user) with a dedicated galaxy admin. Two possibility : - N galaxy tree, each with a different tuned universe_wsgi.ini init file (dedicated path, port, database, etc...). The problem here is on the sysadmin side : the update process effort must be repeated N times. - A unique galaxy tree, and N tuned (dedicated path, port, database, etc...) universe_wsgi.ini files. This seems the best to me but i need to know if galaxy internals can managed that kind of setup ? What do you think ? Any inputs, remarks or advices are welcome ! Regards, Jean-Baptiste

3 2

tmp directory not cleaned up?
by Ryan Golhar 01 Jun '11

01 Jun '11

I just noticed a lot of files in my galaxy tmp directory. Since this isn't a system tmp directory, the system cron scripts don't clean it up. Is there a galaxy cron script that can be used to clean up this directory? -- CONFIDENTIALITY NOTICE: This email communication may contain private, confidential, or legally privileged information intended for the sole use of the designated and/or duly authorized recipient(s). If you are not the intended recipient or have received this email in error, please notify the sender immediately by email and permanently delete all copies of this email including all attachments without reading them. If you are the intended recipient, secure the contents in a manner that conforms to all applicable state and/or federal requirements related to privacy and confidentiality of such information.

2 1

Reload ".loc" files without restarting Galaxy system?
by Luobin Yang 13 May '11

13 May '11

Hi, The "reload a tool's configuration" menu avoids restarting Galaxy system when a tool's web interface changed, but it doesn't seem to reload the ".loc" files that a tool's XML file uses. I am wondering if it's possible to reload the ".loc" file also? Thanks, Luobin

2 1

Filter data on any column and missing values
by Peter Cock 12 May '11

12 May '11

Hi all, I have just found a problem using the "Filter data on any column using simple expressions" tool, i.e. files tools/stats/filters.xml and tools/stats/filters.py I have some six column tabular like this, where I have used \t for a tab, and \n for the new lines: #ID\tHMM_Sprob_score\tSP_len\tRXLR_start\tEER_start\tRXLR?\n gi|301087619|ref|XP_002894699.1|\t0.990\t21\t54\t64\tY\n gi|301087623|ref|XP_002894700.1|\t0.997\t23\t\t\tN\n gi|301087628|ref|XP_002894701.1|\t0.000\t24\t\t\tN\n Breakdown of my data: Column 1 - ID, mandatory string Column 2 - HMM_Sprob_score, mandatory float Column 3 - SP_len, mandatory integer Column 4 - RXLR_start, optional integer Column 5 - EER_start, optional integer Column 6 - RXLR?, mandatory string (Y or N) Notice that in my output columns 4 and 5 can be empty or an integer. I'm trying to filter this file using c6=='Y', i.e. column six is a yes. This works (one row output) but Galaxy tells me: Info: Filtering with c6=='Y', kept 100.00% of 4 lines. Skipped 3 invalid lines starting at line #1: "#ID HMM_Sprob_score SP_len RXLR_start EER_start RXLR?" Then if I try to filter using c6=='N', i.e. column six is a no, it fails to work (zero rows of output instead of three) and tells me: kept 0.00% of 4 lines. Skipped 3 invalid lines starting at line #1: "#ID HMM_Sprob_score SP_len RXLR_start EER_start RXLR?" Digging into the code, tools/stats/filters.py gets given the list of column types from Galaxy and (regardless of which columns are to be used) attempts to cast them to integers, floats, etc. It looks like Galaxy has decided that my columns 4 and 5 are integers (based on the first row), and therefore filters.py blindly tries to using int(...) on all these entries and that fails on the empty cells. I see several issues, (a) The filters.py tool only really needs to cast those columns being used for the filter (fairly easy to fix) (b) The galaxy column type detection seems a bit fragile (hard to really fix without looking at all the data). (c) Are there other tools that would break in a similar way to filter.py? Peter

1 2

Upload fails, webapp and job runner running on different machines
by Louise-Amélie Schmitt 12 May '11

12 May '11

Hi everyone, I'm currently trying a new galaxy install with the webapp and the job runner running on different machines, sharing a nfs volume where the galaxy files are, and another one where the data is supposed to be stored, as specified in the file_path and new_file_path values in the universe_wsgi files. (one for the webapp and one for the runner, as stated in the doc) Both instances run properly with no error message but when I want to upload a file from wherever in a library, here's what happens: The file's row appears in the library with the "uploading" note. The job is registered in the database. And that's all. Nothing changes afterwards. Nothing is saved anywhere. This can't be a file-too-big issue since the test file I'm working with weighs 3.5kB. I probably stupidly missed something but I really don't see where the problem might be. Any idea? Cheers L-A

3 3

Using wget/curl with Tool Shed?
by Peter Cock 11 May '11

11 May '11

Hi all, I'm trying to download and install something from the Galaxy Tool Shed http://community.g2.bx.psu.edu/ on to our sever. To do this, I'd like to copy/paste the URL of a tool's tar ball (or individual XML file etc) from my browser running on my desktop to a terminal window logged in to our server to give to wget or curl. e.g. This URL will give me bam_to_bigwig_0.0.1.tar.gz if run in a web browser: http://community.g2.bx.psu.edu/common/download_tool?cntrller=tool&id=3b1686… However, the URL actually goes to an HTML page then does some javascript magic to redirect to the real file. I can't easily use this URL with wget or curl. Is this something that you could fix in the website code? For now I can just download the file on my desktop, then scp it to the server (or ssh -X into the server and run a browser from there), but this is an unnecessary hurdle I feel. Regards, Peter

2 1

Community server bug and request
by Assaf Gordon 11 May '11

11 May '11

Tiny bug: When viewing a file inside a tarball, the content-type is forced to "text/plain". Works for most files, but not with image files (example: the "Sequence Logo" tool has a jpg image in the tarball, the browser will display it as text/plain with binary characters). This is done in "./lib/galaxy/webapps/community/controllers/tool.py", function "view_tool_file", line 206: trans.response.set_content_type( 'text/plain' ) regardless of the file type. The complicated solution would be to use the "mimetype" library (http://docs.python.org/library/mimetypes.html) or python-magic (https://github.com/ahupp/python-magic) but a simpler workaround would be to just check the file name for "jpg/png/gif" extensions and change the content-type appropriately. Feature request: A way to easily link to tools, so I can send to them to other people or point them in the right direction. Example: Ross's Web logo tool (great tool, BTW), If I want to tell where it is, the link is: http://community.g2.bx.psu.edu/tool/browse_tools?sort=name&f-state=approved… or direct downloading the tarball is: http://community.g2.bx.psu.edu/common/download_tool?cntrller=tool&id=c2f60f… Both are probably not stable links (if a new version is uploaded). -gordon

2 1

"Hide Datasets" bug in workflow
by Assaf Gordon 11 May '11

11 May '11

Hi, There's a small bug with the "hide dataset" button in the workflow editor - once any dataset is marked as "output" (by clicking on the star icon), there's no way to show all the datasets, even if the user manually un-checked every star icon for every workflow step. meaning: If I have a workflow that *used* to have one step marked as "output", unmarking it (so that no dataset is marked is "output", implying all datasets should be retained in the history) doesn't work - galaxy insists on hiding all the other datasets. Steps to reproduce: 1. Import this workflow: http://main.g2.bx.psu.edu/u/foobar/w/hiddendatasetstest2 2. Edit this workflow. Notice that only the last two steps are marked as "output". don't change anything, close with (or without) saving. 3. Run this workflow. Notice that Steps 1,2,3,5,6 have "Action: Hide this dataset." - as expected. Only steps 4,7 are marked as "output". No need to actually execute the workflow. 4. Edit the workflow again. Remove the "output" mark from steps 4,7 ("Add Column" and "compute"). No dataset is marked as output - at least from the user POV, it's implied that all datasets should be kept in the history (just like when editing a new workflow). Save the workflow, close the workflow. 5. Run this workflow. Notice that steps 1,2,3,5,6 STILL have "Action: hide this dataset" - this is the bug. So the only workaround is to run the workflow, then "extract workflow from history" to get a "clean" copy of the workflow without any datasets hidden. Thanks to Marek Kudla for meticulously experimenting with this issue. -gordon

2 1