December 2011 - galaxy-dev - lists.galaxyproject.org

Re: [galaxy-dev] GalaxyCloudman + CADDSuite
by Greg Von Kuster 01 Dec '11

01 Dec '11

Hello Marcel, In the future, please send all questions like this to the galaxy-dev mailing list, as doing so will streamline the process of getting a timely answer. I believe Enis is best able to answer your questions. Thanks! On Nov 30, 2011, at 9:29 AM, Marcel Schumann wrote: > Hi Greg, > > I'm currently trying to create a GalaxyCloudman version that includes CADDSuite. > Thus, I launched GalaxyCloudman as described in your wiki and tried to modify it afterwards. > > Well, starting cloudman worked without any problems... so far, so good :-) > As described on > http://wiki.g2.bx.psu.edu/Admin/Cloud/Customize%20Galaxy%20Cloud > I could then log-in via ssh as user 'ubuntu' (not as user 'galaxy'). > However, all files of the galaxy installation belong to user and group 'galaxy'. > > Thus my question: How should users be able to customize cloudman? Is there some trick by which I can log-in as 'galaxy' or do you have any other idea how to make this work ? ;-) > > Sorry Greg, if you are not the correct contact in this case, but I found not specific contact or mailing list for cloudman. Perhaps, you could just forward this mail in that case ... > > > Cheers, > Marcel > > > -- > Marcel Schumann > > University of Tuebingen > Wilhelm Schickard Institute for Computer Science > Division for Applied Bioinformatics > Room C313, Sand 14, D-72076 Tuebingen > > phone: +49 (0)7071-29 70437 > fax: +49 (0)7071-29 5152 > email: schumann(a)informatik.uni-tuebingen.de Greg Von Kuster Galaxy Development Team greg(a)bx.psu.edu

5 9

lims integration
by Craig Blackhart 01 Dec '11

01 Dec '11

I am newish to Galaxy and trying to learn how I might integrate it with our workflows and LIMS for automated data handling. I am aware of the API and have looked up all the documentation that I could find. However, there are many things I cannot make sense of, and have not been able to find information to help me out. I think a good place to start asking questions is with how to run workflow_execute.py and ask what each of the parameters are and where to get the information from them Arguments *API key - got this and understand *url - got this and understand *workflow_id - I have created workflows and have been able to find what looks to be a workflow_id by clicking on the workflow name and selecting "Download or Export". It seems this may be correct, is it? *history - a named history to use? Should this already exist? I have no idea here. *step=src=dataset_id - ??? I have no idea ??? I have seen how to create data libraries manually at the command line; does this factor in? If anyone has information they can help me out with, it would be much appreciated. Thanks Craig Blackhart Computer Scientist Applied Engineering Technologies Los Alamos National Laboratory 505-665-6588 This message contains no information that requires ADC review

3 2

Job output not returned from cluster
by Joseph Hargitai 01 Dec '11

01 Dec '11

Hi, i was browsing through the list and found many entries for this issue but not a definite answer. We are actually running into this error for simple file uploads from the internal filesystem. thanks, joe

5 9

synchronous data depositing
by James Ireland 01 Dec '11

01 Dec '11

Greetings, I've been attempting to return data to Galaxy via the synchronous data depositing protocol. Using the Biomart, UCSC Table Browser, etc as examples in the data_source tools directory, I've been able to get the initial GET request to my site just fine. However, when I POST back to galaxy I immediately get a redirect to the welcome page and Galaxy never resubmits back to my site. I was wondering if there is more to the protocol than is covered here: http://wiki.g2.bx.psu.edu/Admin/Internals/Data%20Sources or perhaps configuration I need to perform on my local Galaxy installation to correctly handle the POSTs back to tool_runner? Also, are there any code examples I should be looking at? Thanks for your help! -James -- J Ireland www.5amsolutions.com | Software for Life(TM) m: 415 484-DATA (3282)

2 8

Re: [galaxy-dev] Local installation of the tool shed
by Greg Von Kuster 01 Dec '11

01 Dec '11

Hello Louise-Amelie, On Dec 1, 2011, at 4:08 AM, Louise-Amélie Schmitt wrote: > Yes this fixes the problem, it should work fine now :) > > Thanks a lot! > > Hehe, next issues: > > 1) When I have my repo created and I click on the "Upload files to repository", I get a Not found error in the browser: > Not Found > The requested URL /toolshed/upload/upload was not found on this server. This is probably due to your apache rewrite rule: RewriteRule ^/toolshed/upload/(.*) /home/galaxy/galaxy-dev/static/automated_upload/$1 [L] Is this something proprietary you have set up on your local galaxy instance? Try removing it from your rewrite rules for your local tool shed. > > 2) When we try to access our local toolshed from our Galaxy instance, it appears in the "Accessible tool sheds" list along with your two public repos, but when we click on it, the "Valid repositories" list is empty. Is this a bug or does a repo have to actually contain files so it can appear in this list? The list of valid repositories will only include repositories that have content that is valid for a Galaxy instance. These repositories are defined as either "valid" or in some cases "downloadable" (I'm working to replace the latter with the former). See the following sections of the tool shed wiki for details: http://wiki.g2.bx.psu.edu/Tool%20Shed#Repository_revisions:_downloadable_to… http://wiki.g2.bx.psu.edu/Tool%20Shed#Automatic_installation_of_Galaxy_tool… > > 3) Where should the hgweb.config be? This file should be left in the Galaxy root install directory. > Actually, we have absolute paths since in the community_wsgi.ini we set the file_path option to an absolute path. I advise against this - I'm fairly certain it will pose problems at some point. Tool shed paths should be relative. Here are some example entries in only of my local tool sheds: [paths] repos/test/filter = database/community_files/000/repo_1 repos/test/workflow_with_tools = database/community_files/000/repo_2 repos/test/heteroplasmy_workflow = database/community_files/000/repo_3 > > Thanks for your patience! > L-A > Greg Von Kuster Galaxy Development Team greg(a)bx.psu.edu

1 0

Re: [galaxy-dev] [galaxy-user] Using Galaxy Cloudman for a workshop
by Clare Sloggett 01 Dec '11

01 Dec '11

Hi Jeremy, Enis, That makes sense. I know I can configure how many threads BWA uses in its wrapper, with bwa -t. But, is there somewhere that I need to tell Galaxy the corresponding information, ie that this command-line task will make use of up to 4 cores? Or, does this imply that there is always exactly one job per node? So if I have (for instance) a cluster made of 4-core nodes, and a single-threaded task (e.g. samtools), are the other 3 cores just going to waste or will the scheduler allocate multiple single-threaded jobs to one node? I've cc'd galaxy-dev instead of galaxy-user as I think the conversation has gone that way! Thanks again, Clare On Fri, Nov 18, 2011 at 2:36 PM, Jeremy Goecks <jeremy.goecks(a)emory.edu> wrote: > >> On Fri, Nov 18, 2011 at 12:56 AM, Jeremy Goecks <jeremy.goecks(a)emory.edu> wrote: >> >>> Scalability issues are more likely to arise on the back end than the front end, so you'll want to ensure that you have enough compute nodes. BWA uses four nodes by default--Enis, does the cloud config change this parameter?--so you'll want 4x50 or 200 total nodes if you want everyone to be able to run a BWA job simultaneously. >>> >> >> Actually, one other question - this paragraph makes me realise that I >> don't really understand how Galaxy is distributing jobs. I had thought >> that each job would only use one node, and in some cases take >> advantage of multiple cores within that node. I'm taking a "node" to >> be a set of cores with their own shared memory, so in this case a VM >> instance, is this right? If some types of jobs can be distributed over >> multiple nodes, can I configure, in Galaxy, how many nodes they should >> use? > > You're right -- my word choices were poor. Replace 'node' with 'core' in my paragraph to get an accurate suggestion for resources. > > Galaxy uses a job scheduler--SGE on the cloud--to distribute jobs to different cluster nodes. Jobs that require multiple cores typically run on a single node. Enis can chime in on whether CloudMan supports job submission over multiple nodes; this would require setup of an appropriate parallel environment and a tool that can make use of this environment. > > Good luck, > J. > > > > -- E: sloc(a)unimelb.edu.au P: 03 903 53357 M: 0414 854 759

2 5