June 2012 - galaxy-dev - lists.galaxyproject.org

Python revs for latest Galaxy and new Cloudman ?
by Greg Edwards 17 Jun '12

17 Jun '12

Hi, Just checking before I plunge in and upgrade various things. Is the latest Galaxy and Cloudman ( http://wiki.g2.bx.psu.edu/News/NewCloudManRelease ) ok on Python 2.7.3 ? And can I ask what Python rev is incorporated in the new Cloudman AMI ? Presumably a little earlier than 2.7.3. My current 861460482541/galaxy-cloudman-2011-03-22 is Python 2.6.5. Current local Mac is 2.6.1. Sorry about somewhat dim questions. I'm starting a big Python learning binge and want to set up a local Galaxy on Mac, an AWS Galaxy, and a learning environment in the most useful way. Thanks. -- Greg Edwards, Port Jackson Bioinformatics gedwards2(a)gmail.com

2 1

adding IDR to Galaxy
by Quang Trinh 16 Jun '12

16 Jun '12

Hi dev, Just wondering if there is plan to add IDR ( please see the link below ) to Galaxy Amazon AMI? https://sites.google.com/site/anshulkundaje/projects/idr IDR is being used by both modENCODE and ENCODE projects for ChiP-seq analysis. Thanks, Q

1 0

error "Invalid sample id (None)" when transfering data from sequencer
by TerAvest, Emily 15 Jun '12

15 Jun '12

Hi, I am running the May update of galaxy and attempting to transfer data from my sequencer to galaxy. As soon as I hit the transfer button I get an error "Invalid sample id (None)" I can't find anything suspicious in the log file. Is anyone else experiencing this? What can I do to fix this? Thanks Emily This email message and any electronic files transmitted with it relating to the official business of Codexis, Inc. and its affiliates are proprietary, strictly confidential, and may be legally privileged. This email message is intended solely for the use of the individual(s) to whom it is addressed. If you believe you have received this email in error, please delete this email from your system and notify the sender immediately. If you are not the intended recipient, you should not disclose, distribute, use or copy this email or take any action in reliance on its contents.

1 0

ftp connection issue
by ajtong＠ucla.edu 15 Jun '12

15 Jun '12

Hello, I'm a Galaxy user, and recently have been having trouble uploading my files to Galaxy via ftp. I've used both Filezilla and Cyberduck, and have been getting the same error messages: ftp error 530: too many connections to the client (3) however, i have disconnected from all ftp applications and have logged out of all of my Galaxy accounts. I have even changed the password, which should have logged me out of all locations. Is there any way that you can kill all of my connections? This appears to be account specific as my labmate is able to upload his files (same size) to his account. Thanks, Ann-Jay Tong

1 0

Using $NSLOTS in tools to control thread number
by Peter Cock 15 Jun '12

15 Jun '12

Hello all, I'm wondering if it is sensible to make Galaxy tools automatically use the environment variable $NSLOTS to automatically adjust their number of threads? Using $NSLOTS works on SGE, but is it generally used on other clusters? The idea here is rather than hard coding the number of threads in a tool or its XML file, which may need to be altered for different local setups, and it can be specified in universe_wsgi.ini under [galaxy:tool_runners] e.g. By default our SGE allocates one slot, so with the following BWA should use one thread: [galaxy:tool_runners] bwa_wrapper = drmaa://-V/ However, if we ask SGE for 8 slots, the tool should use eight threads: [galaxy:tool_runners] bwa_wrapper = drmaa://-V -pe smp 8/ For this to be truly general, we would need a way to set environement variables for local:// runners. However, we can cope with $NSLOTS being undefined with a little magic in the XML definitions. .e.g For the BWA wrapper this is currently hard coded to use 4 threads: <command interpreter="python"> bwa_wrapper.py --threads="4" ... </command> Instead, this could be something like this: <command interpreter="python"> bwa_wrapper.py #if "$NSLOTS"=="": --threads="4" #else: --threads="$NSLOTS" #end if ... </command> Likewise for the BLAST+ wrappers etc. i.e. If the environment variable is set (e.g. via the cluster settings) use that, otherwise keep the current hard coded default. Would this work in principle on other cluster setups? i.e. Is $NSLOTS sufficiently general? It would be messy but the XML if statement could be expanded to handle a second environment variable as well if needs be. Would the Galaxy team accept a pull request implementing this for the BWA and BLAST+ wrappers? Thanks, Peter

2 4

CloudMan: Command not found
by Jose Navas 14 Jun '12

14 Jun '12

Hi Galaxy Team, I'm modifying my Galaxy CloudMan instance by adding custom tools. I've installed my tools under the /mnt/galaxyTools/tools folder and I've modified the .bashrc files from the sgeadmin and galaxy users to add the needed paths to the PATH and PYTHONPATH variables. When I'm in Galaxy and I try to launch one of my tools, it fails and shows the 'Command not found' error. Where I should add the paths to make Galaxy now where are my executables? Whan I log into the isntance through ssh and I use the galaxy user, it knows where are my executables. What I'm missing? Thanks,Jose

3 5

Query Fastq files for particular sequence elements
by Jane Dorweiler 14 Jun '12

14 Jun '12

Greetings all, I've been trying to find a way to query fastq files for particular sequence elements. Our data was mapped using BWA by our collaborator, and repetitive elements 'ignored', but we are now interested in determining whether a couple specific repetitive elements of interest are differentially represented in the raw read files. Are there any tools that anyone has developed to do anything like this -- and that perhaps I'm simply missing as I explore the available tools? In the short term, I've written a very crude python script to begin exploring the question, but I'm sure there has to be a much better way. If there are no such tools available, I'm hopeful that someone might have some helpful suggestions, or that perhaps it could be explored during the upcoming conference &/or training day in July. Thanks and Best Regards, Jane -- Jane E. Dorweiler, PhD

2 1

Loss of sub-task order with <parallelism>
by Peter Cock 14 Jun '12

14 Jun '12

Dear all, Are there any known issues with the job-splitting code (i.e. the new <parallelism> tags in the tool wrappers) and the order of the sub-jobs? I've noticed on our production Galaxy (a bit old now, 6799:40f1816d6857 from 7 March) two apparent problems here. I added a diagnostic print statement to the jobs' stdout giving the node number and SGE job number. When viewing the combined stdout in Galaxy, the SGE job numbers should (I think) be strictly increasing. That isn't always the case, e.g. here task_7 was added to the queue before task_6: /mnt/galaxy/galaxy-dist/database/job_working_directory/004/4055/task_0: Running on n3 as job 27700 /mnt/galaxy/galaxy-dist/database/job_working_directory/004/4055/task_1: Running on n12 as job 27701 /mnt/galaxy/galaxy-dist/database/job_working_directory/004/4055/task_2: Running on n8 as job 27702 /mnt/galaxy/galaxy-dist/database/job_working_directory/004/4055/task_3: Running on n6 as job 27703 /mnt/galaxy/galaxy-dist/database/job_working_directory/004/4055/task_4: Running on n11 as job 27704 /mnt/galaxy/galaxy-dist/database/job_working_directory/004/4055/task_5: Running on n10 as job 27705 /mnt/galaxy/galaxy-dist/database/job_working_directory/004/4055/task_6: Running on n4 as job 27707 /mnt/galaxy/galaxy-dist/database/job_working_directory/004/4055/task_7: Running on n5 as job 27706 /mnt/galaxy/galaxy-dist/database/job_working_directory/004/4055/task_8: Running on n9 as job 27708 /mnt/galaxy/galaxy-dist/database/job_working_directory/004/4055/task_9: Running on n7 as job 27709 /mnt/galaxy/galaxy-dist/database/job_working_directory/004/4055/task_10: Running on n12 as job 27710 /mnt/galaxy/galaxy-dist/database/job_working_directory/004/4055/task_11: Running on n9 as job 27711 /mnt/galaxy/galaxy-dist/database/job_working_directory/004/4055/task_12: Running on n6 as job 27712 /mnt/galaxy/galaxy-dist/database/job_working_directory/004/4055/task_13: Running on n7 as job 27713 /mnt/galaxy/galaxy-dist/database/job_working_directory/004/4055/task_14: Running on n4 as job 27714 /mnt/galaxy/galaxy-dist/database/job_working_directory/004/4055/task_15: Running on n8 as job 27715 /mnt/galaxy/galaxy-dist/database/job_working_directory/004/4055/task_16: Running on n10 as job 27716 In a separate example with 33 sub-tasks, there were two of these inversions, while in yet another example with 33 sub-tasks there was a trio submitted out of order. This non-deterministic behavior is a little surprising, but in itself not an immediate problem. In what appears to be a separate (and more concerning) loss of order, after merging the output file order appears randomized. I would expect the output from task_0, then task_1, ..., finally task_16. I haven't yet worked out what order I am getting, but it isn't this, and neither is it the order from the SGE job numbers (e.g. correct bar one pair switched round). Having looked at lib/galaxy/jobs/runners/tasks.py the source of this behaviour currently eludes me [*]. Has anyone else observed anything like this before? Regards, Peter [*] P.S. I would like to see an upper bound on the sleep_time in method run_job, say half an hour? Otherwise with a group of long running jobs it seems Galaxy may end up waiting a very long time between checks for their completion since it just doubles the wait at each point. I had sometimes noticed a delay between the sub-jobs finishing according to the cluster and Galaxy doing anything about merging it - this is probably why.

2 2

Changing default snapshots
by Jose Navas 14 Jun '12

14 Jun '12

Hi everyone, I tried to create an AMI with my tools integrated on Galaxy, but by default it mounts the snapshots with the tools from Galaxy. How I can change the snapshots mounted by default? i.e. before I start a cluster. Thanks,Jose

1 0

dealing with whitespace string as separator in dynamic options
by Sophie Creno 14 Jun '12

14 Jun '12

Hello, I have a file with fixed-width columns and I would like to use the second column to populate a select list parameter. My problem is that the number of whitespaces changes depending on the content of the first column. Given that split is used in the underlying code, I wanted to give None as separator but I can't. If I put: <options from_dataset="input_file" separator="None"> it's the string "None" that is considered as the separator and if I write: <options from_dataset="input_file" separator=""> then the exception "Empty separator" is raised. Should I put something else to do what I want? If not, would it be possible to consider changing something in the code of dynamic_options.py such as replacing the empty string with None to let the split takes place for example? Thanks in advance, Sophie Creno

1 0