November 2013 - galaxy-dev - lists.galaxyproject.org

Contributing to genome indexes on rsync server
by Brad Chapman 08 Nov '13

08 Nov '13

Hi all; Is there a way for community members to contribute indexes to the rsync server? This resource is awesome and I'm working on migrating the CloudBioLinux retrieval scripts to use this instead of the custom S3 buckets we'd set up previously: https://github.com/chapmanb/cloudbiolinux/blob/master/cloudbio/biodata/gala… It's great to have this as a public shared resource and I'd like to be able to contribute back. From an initial pass, here are the things I'd like to do: - Include bowtie2 indexes for more genomes. - Include novoalign indexes for a number of commonly used genomes. - Clean up hg19 to include a full canonically sorted hg19, with indexes. Broad has a nice version prepped so GATK will be happy with it, and you need to stick with this ordering if you're ever going to use a GATK tool on it. Right now there is a partial hg19canon (without the random/haplotype chromosomes) and the structure is a bit complex. What's the best way to contribute these? Right now I have a lot of the indexes on S3. For instance, the hg19 indexes are here: https://s3.amazonaws.com/biodata/genomes/hg19-bowtie.tar.xz https://s3.amazonaws.com/biodata/genomes/hg19-bowtie2.tar.xz https://s3.amazonaws.com/biodata/genomes/hg19-bwa.tar.xz https://s3.amazonaws.com/biodata/genomes/hg19-novoalign.tar.xz https://s3.amazonaws.com/biodata/genomes/hg19-seq.tar.xz https://s3.amazonaws.com/biodata/genomes/hg19-ucsc.tar.xz I'm happy to format these differently or upload somewhere that would make it easy to include. Thanks again for setting this up, I'm looking forward to working off a shared repository of data, Brad

6 8

Bowtie2 mm9 index
by Davis, Mary 08 Nov '13

08 Nov '13

Greetings- I tried to run an alignment using Bowtie2 and got this message- format: bam, database: mm9 Could not locate a Bowtie index corresponding to basename "/galaxy/data/mm9/mm9canon/bowtie2_index/mm9canon" Error: Encountered internal Bowtie 2 exception (#1) Command: /galaxy/software/linux2.6-x86_64/pkg/bowtie2-2.1.0/bin/bowtie2-align --wrapper basic I imported Illumina fastq data, groomed them, and then did the analysis using the built-in index mouse, both full and male, and had the same error message. I'm relatively new to this, and don't see what I missed. Thanks Mary E. Davis, Ph.D. Professor Department of Physiology & Pharmacology West Virginia University Health Sciences Center PO Box 9229 Morgantown, WV 26506-9229

2 2

Re: [galaxy-dev] Empty bowtie2 output
by IIHG Galaxy Administrator 08 Nov '13

08 Nov '13

Sending to galaxy-dev instead. From: Srinivas Maddhi <iihg-galaxy-admin(a)uiowa.edu<mailto:iihg-galaxy-admin@uiowa.edu>> Date: Friday, November 1, 2013 11:56 AM To: "galaxy-user(a)lists.bx.psu.edu<mailto:galaxy-user@lists.bx.psu.edu>" <galaxy-user(a)lists.bx.psu.edu<mailto:galaxy-user@lists.bx.psu.edu>> Subject: Empty bowtie2 output In follow-up to http://user.list.galaxyproject.org/Empty-bowtie2-output-tp4656137.html, is there: - an ETA on when the issue with Bowtie2, in August 2013 distribution, generating empty output will be fixed (if not already fixed) ? - a suggested workaround (revert to an older version of that particular tool etc.) in the meantime ? Thank you. Unrelated: wasn't able to determine how to update that thread to request status, hence creating a new one.

2 1

Fw: "No peek" issue and datasets wrongly reported as "Empty"
by Jean-Francois Payotte 08 Nov '13

08 Nov '13

Dear Galaxy developers, I know I am not the only one with this issue, as over time I've stumbled on a few mailing-list threads with other users having the same problem. And I know the recommended solution is to use the -noac mount option. ( http://wiki.galaxyproject.org/Admin/Config/Performance/Cluster#Unified_Meth… ) However, it is said that using this -noac mount option comes with a performance trade-off, so when we first ran into this issue (datasets showing "Empty" and "No peek", even though the file on the hard drive is full of content), we implemented the hack found in this thread: http://dev.list.galaxyproject.org/What-s-causing-this-error-td4141958.html#… In this thread, John suggested to add a "sleep()" in the "finish_job" method of the "galaxy_dist/lib/galaxy/jobs/runnersdrmaa.py" file. It worked very well for us. Adding a sleep(30) made all the jobs waiting 30 seconds before finishing, but the "No peek" issue had at least disappear). However, since the latest Galaxy updates, this file (drmaa.py) has been dramastically changed and the "finish_job" method doesn't exist anymore. Hence, I had to remove this hack, hoping that this issue would have disappeared as well. Unfortunaley, this "No peek" issue is still there and causing many headaches to some of our workflows users. My question is then: Can I put this "sleep(30)" in some other place (method and/or file) in order to achieve the same result? I would really like to solve this "No peek" issue without resorting to the "-noac" mount option. Actually, I am not even sure our system administrator would allow it. Thanks again for your help! Jean-François

2 2

Re: [galaxy-dev] datacache & bowtie2 for mm9 ?
by Curtis Hendrickson (Campus) 07 Nov '13

07 Nov '13

Jennifer, What's the status of bowtie2/mm9 index on PSU main? When I select tophat2, it offers me mm9 as a choice for built-in indexes. However, when the job runs, I get the following error, indicating the bowtie2/mm9 indexes are missing (below). Any insight into whether this is expected, or what the ETA is until the index would be installed, would be great. I'm trying to reproduce work on PSU I ran on my local galaxy, so that we can link to it for supplemental materials for a paper. Thanks, Curtis PS - I clicked the submit bug button a few days ago, but haven't received a response yet. Fatal error: Tool execution failed [2013-10-29 10:13:27] Beginning TopHat run (v2.0.9) ----------------------------------------------- [2013-10-29 10:13:27] Checking for Bowtie Bowtie version: 2.1.0.0 [2013-10-29 10:13:27] Checking for Samtools Samtools version: 0.1.18.0 [2013-10-29 10:13:27] Checking for Bowtie index files (genome).. Error: Could not find Bowtie 2 index files (/galaxy/data/mm9/mm9full/bowtie2_index/mm9full.*.bt2) From: Jennifer Jackson [mailto:jen@bx.psu.edu] Sent: Friday, September 20, 2013 4:00 PM To: Curtis Hendrickson (Campus) Subject: Re: [galaxy-dev] datacache & bowtie2 for mm9 ? Thanks Curtis, I am actually working to try to get mm9 out there right now. No promises, but is just one (well, three, including variants)! If technical is a go, then will do it. Ideally others soonish. We'll see. The last news brief has help for the Data manager, it may be that you need to do some config changes to get it going. I am certainly no expert - this is Dan's and under active development - but is where I would start. Jen On 9/20/13 1:25 PM, Curtis Hendrickson (Campus) wrote: Thanks for the rapid reply! I have some questions and comments, but need to read up on Data Managers (that admin page seems non-functional in our local galaxy, despite being on latest code) first. Regards, Curtis From: Jennifer Jackson [mailto:jen@bx.psu.edu] Sent: Friday, September 20, 2013 2:34 PM To: Curtis Hendrickson (Campus) Cc: galaxy-dev(a)bx.psu.edu<mailto:galaxy-dev@bx.psu.edu> Subject: Re: [galaxy-dev] datacache & bowtie2 for mm9 ? Hello Curtis, The datacache was originally pointed to the data staging area and is now pointed to the data published area. The difference is that the published area contains data and location (.loc) files that are in synch and have completed final testing. It is your choice about whether to use the staged-only data - it depends how risk tolerant your project is and if you plan on testing. But, that said, I think it is almost certainly fine or our team wouldn't have staged it yet. A vanishingly small number of datasets are pulled back once they make it to staging, and this is why we were comfortable pointing datacache there in the first place (were unable to point to the published area at first, but wanted to make the data available ASAP). Going forward - I can let you know that these indexes are very easy to create: one command-line execution, then add one line to the associated .loc file. Instructions are here, see "Bowtie and Tophat": http://wiki.galaxyproject.org/Admin/NGS%20Local%20Setup For one or few genomes, not a problem. For hundreds of genomes with variants, can become tedious even with helper tools and in our case, the processing interacted with disk that was undergoing changes (as we have been working on system configuration most of the summer). Also, with the Data Manager is now available, creating batch indexes for use via rsync become lower priority. Even so, I would expect more indexes to be fully published once the final configuration is in place, as many are already staged or close being staged (watch the yellow banner on Main). Hopefully this helps to explain the data, guides you to making an informed decision, and aids with creating your own indexes as needed, Thanks! Jen Galaxy team On 9/18/13 1:04 PM, Curtis Hendrickson (Campus) wrote: Folks, First, I wanted to thank you for making the datacache available (http://wiki.galaxyproject.org/Admin/Data%20Integration; rsync://datacache.g2.bx.psu.edu) It's a great resource. However, what is the best way to stay abreast of changes to what's in datacache, and understand how these indexes are computed? We are currently upgrading to bowtie2, but I notice that the bowtie2 indices for mm9, which used to be in rsync://datacache.g2.bx.psu.edu/indexes/mm9/mm9*/bowtie2_index have been removed, and only the hg19 genome has bowtie2 indices. Why only that one, and not the others? Where are the scripts you use to make these indices, in case I want to create bowtie2 indices for other So, how do I find out *why* they were removed? (Can I safely use the copy I have, or was there a problem with them?) More generally, how do I understand the policies and logic behind the datacache indices, and be notified of changes, short of running my own periodic rsync/diff? Finally, since I'm doing "reproducible research" is anything planned for systematically versioning genome indices, so I can easily tell what version of a system (ie, what BWA version) was used to create the index, and be sure that an index will not suddenly disappear. Thanks, Curtis Research Associate/CTSA-Informatics Team University of Alabama at Birmingham ___________________________________________________________ Please keep all replies on the list by using "reply all" in your mail client. To manage your subscriptions to this and other Galaxy lists, please use the interface at: http://lists.bx.psu.edu/ To search Galaxy mailing lists use the unified search at: http://galaxyproject.org/search/mailinglists/ -- Jennifer Hillman-Jackson http://galaxyproject.org -- Jennifer Hillman-Jackson http://galaxyproject.org

2 1

Comments on and a fix for John Chilton's tool-dependency-resolver-plugins
by Guest, Simon 07 Nov '13

07 Nov '13

Hi John, This is in regard to this: https://bitbucket.org/galaxy/galaxy-central/pull-request/228/tool-dependenc… Overall, this is very useful, just what I need, thanks. I'd *really* like to see this feature in the mainline Galaxy. Is there some voting necessary on Trello to achieve this, or is it enough to be enthusiastic here? I tested the ModuleDependencyResolver, and fixed three problems: 1. Fixed up module loading to work properly. The problem is that 'module' is not a first class command, it's a shell function. And it only works from interactive shells. The solution is to use the underlying modulecmd command. This requires deeper knowledge in the modules resolver of how environment modules work, which obviates the DEFAULT_MODULE_COMMAND and the flexibility to override it. 2. Made versionless fallback work, i.e. use the matching version if it exists, and only fallback to a generic match if it doesn't. 3. Enhanced the DirectoryModuleChecker to look along the modulepath, not just in a single directory. The default path is initialised appropriately from environment variables MODULEPATH, MODULESHOME, as per module(1). This can be overridden with the attribute modulepath rather than directory in the config file. Fix attached - I presume a Mercurial export is all you need? It may be better to default prefetch to false (but I didn't change that). Otherwise the Galaxy server needs restarting after new system packages become available. Now, there's one more thing required, which I'm not sure how to achieve. I intend to run with this config: <dependency_resolvers> <modules prefetch="false" versionless="true"/> </dependency_resolvers> So in particular I'm not interested in tool_shed_packages. However, when I install from the toolshed, say, the emboss tool, it still downloads the source tarball and tries to compile it locally (which fails, as I don't have make installed on my production Galaxy, nor do I want it). The emboss tool status in the "Manage installed tool shed repositories" list is "Installed, missing tool dependencies", but actually my installed modules mean the tool dependencies are satisfied. The behaviour I'm after is not even to try to do the actions in a tool_dependency.xml package spec in the toolshed, if I have dependency resolvers configured without tool_shed_packages. What are your thoughts on that? cheers, Simon

4 7

Installing Galaxy behind an Apache proxy using mod_auth_cas for user auth
by Sandra Gesing 07 Nov '13

07 Nov '13

Dear all, I would like to set up a local Galaxy instance behind an Apache server with our local CAS for authentication. It would be great if you could give me a hint for the httpd.conf. I have the problem that after authenticating against CAS in the browser, I get following error message and REMOTE_USER doesn't seem to be in the HTTP header for Galaxy (I can see the REMOTE_USER in the access_log of Apache but not any more in paster.log of Galaxy). "Access to Galaxy is denied Galaxy is configured to authenticate users via an external method (such as HTTP authentication in Apache), but a username was not provided by the upstream (proxy) server. This is generally due to a misconfiguration in the upstream server." I know that the same question was already asked in the following post but I haven't seen an option to extend the post and I haven't found an answer. http://dev.list.galaxyproject.org/Installing-Galaxy-behind-an-Apache-proxy-… Any help is much appreciated. Many thanks, Sandra

1 1

Dynamic tool configuration
by Biobix Galaxy 07 Nov '13

07 Nov '13

Hi all, We are working on a galaxy tool suite for data analysis. We use a sqlite db to keep result data centralised between the different tools. At one point the tool configuration options of a tool should be dependent on the rows within a table of the sqlite db that is the output of the previous step. In other words, we would like to be able to set selectable parameters based on an underlying sql statement. If sql is not possible, an alternative would be to output the table content into a txt file and subsequently parse the txt file instead of the sqlite_db within the xml configuration file. When looking through the galaxy wiki and mailing lists I came across the <code> tag which would be ideal, we could run a python script in the background to fetch date from the sqlite table, however that function is deprecated. Does anybody know of other ways to achieve this? Thanks! Jeroen Ir. Jeroen Crappé, PhD Student Lab of Bioinformatics and Computational Genomics (Biobix) FBW - Ghent University

2 1

Bug: Two copies of wiggle_to_simple.xml
by Peter Cock 07 Nov '13

07 Nov '13

Hi all, There are two copies of the wiggle_to_simple tool in the main repository, and this duplication appears to have happened back in 2009. $ grep wiggle_to_simple tool_conf.xml.sample <tool file="filters/wiggle_to_simple.xml" /> <tool file="stats/wiggle_to_simple.xml" /> $ diff tools/filters/wiggle_to_simple.py tools/stats/wiggle_to_simple.py (no changes) $ diff -w tools/filters/wiggle_to_simple.xml tools/stats/wiggle_to_simple.xml 15,18d14 < <test> < <param name="input" value="3.wig" /> < <output name="out_file1" file="3_wig.bed"/> < </test> The tools/filters/wiggle_to_simple.xml version has Windows newlines, and 2 tests. The tools/stats/wiggle_to_simple.xml version has Unix newlines, but only 1 test. I would therefore suggest merging the two (Unix newlines, both tests). Peter

2 1

Security vulnerability in Galaxy filtering tools
by Nate Coraor 07 Nov '13

07 Nov '13

A security vulnerability was recently discovered by John Chilton with Galaxy's "Filter data on any column using simple expressions" and "Filter on ambiguities in polymorphism datasets" tools that can allow for arbitrary execution of code on the command line. The fix for these tools has been committed to the Galaxy source. The timing of this commit coincides with the next Galaxy stable release (which has also been pushed out today). To apply the fix and simultaneously update to the new Galaxy stable release, ensure you are on the stable branch and upgrade to the latest changeset: % hg branch stable % hg pull -u For Galaxy installations that administrators are not yet ready to upgrade to the latest release, there are three workarounds. First, for Galaxy installations running on a relatively new version of the stable release (e.g. release_2013.08.12), Galaxy can be updated to the specific changeset that that contains the fix. This will include all of the stable (non-feature) commits that have been accumulated since the 8/12 release plus any new features included with (and prior to) the 8/12 release, but without all of the new features included in the 11/4 release. Ensure you are on the stable branch and then upgrade to the specific changeset: % hg pull -u -r e094c73fed4d Second, the patch can be downloaded and applied manually: % wget -o security.patch https://bitbucket.org/galaxy/galaxy-central/commits/e094c73fed4dc66b589932e… and then: % hg patch security.patch or: % patch -p1 < security.patch Third, the tools can be completely disabled by removing them from the tool configuration file (by default, tool_conf.xml) and restarting all Galaxy server processes. The relevant lines in tool_conf.xml are: <tool file="stats/dna_filtering.xml" /> <tool file="stats/filtering.xml" /> The full 11/4 Galaxy Distribution News Brief will be available later today and will contain details of changes since the last release. --nate Galaxy Team

4 5