August 2010 - galaxy-dev - lists.galaxyproject.org

Downloading datasets
by SHAUN WEBB 26 Aug '10

26 Aug '10

Hi, I've noticed that when I try to save a large history dataset from history (e.g SAM file with 17 million lines) that memory usage by the python process running Galaxy increases dramatically, sometimes to the point of crashing my server. Even after the file has been saved to my machine the memory usage is maintained. Is there any way to clear this? Thanks Shaun Webb -- The University of Edinburgh is a charitable body, registered in Scotland, with registration number SC005336.

2 1

Re: [galaxy-dev] [galaxy-user] finding file format of data item
by Jeremy Goecks 26 Aug '10

26 Aug '10

>> >> I have a tool that can be run on multiple data formats. But I need to run it differently depending on the format of the input dataset. How can I get the input data format? >> >> In the tools .xml file I have: >> >> <param format="interval,lped" name="input1" type="data" label="SNPs"> >> >> and >> >> #if $input1.metadata.format=="interval" ... >> >> The if doesn't work, what should be there? Hi Belinda, The condition you want to use is this: #if isinstance( $input1.datatype, $__app__.datatypes_registry.get_datatype_by_extension('interval').__class__): You can see this example in context by looking at the XML tool definition file for the intersect tool, intersect.xml Let us know if this doesn't solve your problem. Best, J.

1 0

GMOD Europe 2010 Update
by Dave Clements, GMOD Help Desk 25 Aug '10

25 Aug '10

Hello all, GMOD Europe 2010 (http://gmod.org/wiki/GMOD_Europe_2010) starts in less than two weeks. If you are planning on attending and haven't yet registered, then please do so in the next few days. The BioMart Workshop has only a few slots left, and the registration fee for the GMOD Meeting goes from £50 to £65 on 6 September. We are encouraging people to submit topic suggestions for the GMOD Meeting and the GMOD Satellite Meetings. We are also pleased to announce that Jason Swedlow of the University of Dundee will be the guest speaker at the GMOD Meeting. Jason will speak on "The Open Microscopy Environment: Open Informatics for Biological Imaging," a particularly timely topic as many researchers move into image intensive areas such as phenotypes and gene expression, and as high-throughput imaging platforms become available. There is still space available at all 4 events, and registration for the GMOD Satellite Meetings, InterMine Workshop and BioMart Workshop is free. Scott Cain and Dave Clements Links: http://gmod.org/wiki/GMOD_Europe_2010 http://gmod.org/wiki/September_2010_GMOD_Meeting http://gmod.org/wiki/Satellite_Meetings_-_GMOD_Europe_2010 http://gmod.org/wiki/InterMine_Workshop_-_GMOD_Europe_2010 http://gmod.org/wiki/BioMart_Workshop_-_GMOD_Europe_2010

1 0

Workflow post-actions?
by Jesse Erdmann 25 Aug '10

25 Aug '10

With the new(-ish) post task workflow actions, if a particular task has multiple outputs is it correct that you may only apply an action to only one of the datasets. E.G., I can hide one dataset, but not another from a specific task? The interface allows me to specify an action on multiple outputs, but when I select a different task, then select the task with multiple actions it will only keep one of each type. Is this the expected behavior? Thanks! -- Jesse Erdmann Bioinformatics Analyst Masonic Cancer Center University of Minnesota jerdmann(a)umn.edu 612-626-3123 jesse(a)jesseerdmann.com Twitter: http://twitter.com/jesseerdmann

2 1

integrating Galaxy with a relational data warehouse?
by Yury Bukhman 25 Aug '10

25 Aug '10

Dear Galaxy developers, we are planning to build a data warehouse for a research center that utilizes multiple high-throughput experimental platforms, e.g. plate-based HTS assays, microarrays of several different types, ChIP-seq, RNA-seq. We have been thinking of managing the data in a relational database. Galaxy looks attractive to us for its workflow management and data provenance features, e.g. to keep track of how raw data are analyzed to produce normalized & summarized datasets and/or final sets of statistics such as p values. We wonder how amenable would Galaxy be to integration with a relational data store. One possible scenario might be to have Galaxy import a dataset from a relational database, run a workflow, then submit the results back to the database with the associated history or link thereto. Another possibility is to forgo the relational database altogether and do all our data management within Galaxy. Any thoughts? We don't have much experience with Galaxy and would appreciate insights from those who do. Many thanks. Yury -- Yury V. Bukhman, Ph.D. Associate Scientist, Bioinformatics Great Lakes Bioenergy Research Center University of Wisconsin - Madison 445 Henry Mall, Rm. 513 Madison, WI 53706, USA Phone: 608-890-2680 Fax: 608-890-2427 Email: ybukhman(a)glbrc.wisc.edu

1 0

search tools
by Davide Cittaro 25 Aug '10

25 Aug '10

Hi, I've noticed that one tool adde by me to a local galaxy can't be searched in the search tools... are there some tips on how enabling this (default tools can be searched) d /* Davide Cittaro Cogentech - Consortium for Genomic Technologies via adamello, 16 20139 Milano Italy tel.: +39(02)574303007 e-mail: davide.cittaro(a)ifom-ieo-campus.it */

2 1

sort broken
by Isabelle Phan 25 Aug '10

25 Aug '10

Hello, I installed galaxy-dev using hg clone -r 60448575467f http://bitbucket.org/galaxy/galaxy-central galaxy-dist since http://www.bx.psu.edu/hg/galaxy seems down. I started galaxy, uploaded a fasta file, computed sequence lengths, then sorted on c2. The sort fails with the error: Tool execution generated the following error message: sort: multi-character tab `$\t' I created a tab-delimited test file with 2 column, one letter in the first, one digit in the second, then sorted on c1: got the same error. The sorter.py is doing a system call to the unix sort function (I have sort (GNU coreutils) 7.4). I tested my shell by running this on the command line /usr/bin/sort -f -t $'\t' test-file and it worked. I modified sorter.py to use /usr/bin/sort, restarted galaxy... and still get the same error. So there is something wrong in the way the shell escapes the $'\t' in the sorter.py line: os.system(cmd) Any help greatly appreciated. Isabelle

2 2

Stalled data set imports, where should I be looking?
by Ry4an Brase 25 Aug '10

25 Aug '10

I've created a new data library and have added two datasets using the admin interface to provide file paths to two large-ish files. In the admin interface they're still showing up as "Information: This job is queued" but no jobs show up in the admin job list, system load is zero, and I see no mention of the files in the log past the initial web-requests that kicked off the process. I do have "track_jobs_in_database = True" and "enable_job_recovery = True", and I have previously killed off some data-set-addition jobs that seemed stalled and _were_ using CPU. Where should I be looking for blocked data set import jobs? Do load jobs make it into the database too? Is there I table I can clear out to remove the import jobs that I interrupted by deleting their library previously? Is there anything I should be looking for in the logs Is there a handle I can jiggle? I'm running rev 2c7acb546d6d. Thanks! -- Ry4an Brase 612-626-6575 University of Minnesota Supercomputing Institute for Advanced Computational Research http://www.msi.umn.edu

2 8

Galaxy: problem with job status update
by Tanguy Le Carrour 25 Aug '10

25 Aug '10

Hello, I have installed an instance of Galaxy on our server. I configured it to run the jobs on our cluster (SGE). Everything works just fine. I can run NGS jobs that run on our cluster nodes and return results to Galaxy. But, sometimes, I have a problem with the job status that is not updated! It can be flagged as "waiting" and never run, or, even worst, stay flagged as "running". When I check the logs I can see that the job has finished normally [1]. As the step is not flagged as "done" I cannot go further with my workflow. Any help would be welcomed! Thanks for your great work with Galaxy! Best regards, Tanguy LE CARROUR [1] Galaxy logs sample: galaxy.jobs DEBUG 2010-08-25 11:59:45,727 dispatching job 8 to sge runner galaxy.jobs.runners.sge DEBUG 2010-08-25 11:59:47,935 (8) submitting file /srv/GT/www/htdocs/Galaxy/new_galaxy/database/pbs/galaxy_8.sh galaxy.jobs.runners.sge DEBUG 2010-08-25 11:59:47,936 (8) command is: python /srv/GT/www/htdocs/Galaxy/new_galaxy/tools/samtools/sam_to_bam.py --input1=/srv/GT/www/htdocs/Galaxy/new_galaxy/database/files/000/dataset_9.dat --dbkey=hg19 --ref_file="None" --output1=/srv/GT/www/htdocs/Galaxy/new_galaxy/database/files/000/dataset_10.dat --index_dir=/srv/GT/www/htdocs/Galaxy/new_galaxy/tool-data galaxy.jobs.runners.sge DEBUG 2010-08-25 11:59:47,941 (8) queued in ims.q queue as 202466 galaxy.jobs.runners.sge DEBUG 2010-08-25 11:59:48,590 (8/202466) state change: job is queued and waiting to be scheduled galaxy.jobs.runners.sge DEBUG 2010-08-25 11:59:54,594 (8/202466) state change: job is running galaxy.jobs.runners.sge DEBUG 2010-08-25 12:02:58,979 (8/202466) state change: process status cannot be determined galaxy.jobs.runners.sge DEBUG 2010-08-25 12:02:59,979 (8/202466) state change: job finished normally -- Tanguy LE CARROUR Functional Genomics Center Zurich ETH Zurich / University of Zurich Winterthurerstrasse 190 8057 Zurich Switzerland

2 1

New version for snpEff tool
by Pablo Cingolani 25 Aug '10

25 Aug '10

Hi everyone! I've uploaded a new version of snpEff (SNP effect prediction tool) to the community site. It's still waiting for approval. Yours Pablo

2 1