I've noticed that when I try to save a large history dataset from
history (e.g SAM file with 17 million lines) that memory usage by the
python process running Galaxy increases dramatically, sometimes to the
point of crashing my server. Even after the file has been saved to my
machine the memory usage is maintained. Is there any way to clear this?
The University of Edinburgh is a charitable body, registered in
Scotland, with registration number SC005336.
>> I have a tool that can be run on multiple data formats. But I need to run it differently depending on the format of the input dataset. How can I get the input data format?
>> In the tools .xml file I have:
>> <param format="interval,lped" name="input1" type="data" label="SNPs">
>> #if $input1.metadata.format=="interval" ...
>> The if doesn't work, what should be there?
The condition you want to use is this:
#if isinstance( $input1.datatype, $__app__.datatypes_registry.get_datatype_by_extension('interval').__class__):
You can see this example in context by looking at the XML tool definition file for the intersect tool, intersect.xml
Let us know if this doesn't solve your problem.
With the new(-ish) post task workflow actions, if a particular task
has multiple outputs is it correct that you may only apply an action
to only one of the datasets. E.G., I can hide one dataset, but not
another from a specific task? The interface allows me to specify an
action on multiple outputs, but when I select a different task, then
select the task with multiple actions it will only keep one of each
type. Is this the expected behavior? Thanks!
Masonic Cancer Center
University of Minnesota
Dear Galaxy developers,
we are planning to build a data warehouse for a research center that utilizes multiple high-throughput experimental platforms, e.g. plate-based HTS assays, microarrays of several different types, ChIP-seq, RNA-seq. We have been thinking of managing the data in a relational database. Galaxy looks attractive to us for its workflow management and data provenance features, e.g. to keep track of how raw data are analyzed to produce normalized & summarized datasets and/or final sets of statistics such as p values. We wonder how amenable would Galaxy be to integration with a relational data store.
One possible scenario might be to have Galaxy import a dataset from a relational database, run a workflow, then submit the results back to the database with the associated history or link thereto.
Another possibility is to forgo the relational database altogether and do all our data management within Galaxy.
Any thoughts? We don't have much experience with Galaxy and would appreciate insights from those who do.
Yury V. Bukhman, Ph.D.
Associate Scientist, Bioinformatics
Great Lakes Bioenergy Research Center
University of Wisconsin - Madison
445 Henry Mall, Rm. 513
Madison, WI 53706, USA
Phone: 608-890-2680 Fax: 608-890-2427
Hi, I've noticed that one tool adde by me to a local galaxy can't be searched in the search tools... are there some tips on how enabling this (default tools can be searched)
Cogentech - Consortium for Genomic Technologies
via adamello, 16
I installed galaxy-dev using
hg clone -r 60448575467f http://bitbucket.org/galaxy/galaxy-central galaxy-dist
since http://www.bx.psu.edu/hg/galaxy seems down.
I started galaxy, uploaded a fasta file, computed sequence lengths, then sorted on c2.
The sort fails with the error:
Tool execution generated the following error message:
sort: multi-character tab `$\t'
I created a tab-delimited test file with 2 column, one letter in the first, one digit in the second, then sorted on c1: got the same error.
The sorter.py is doing a system call to the unix sort function (I have sort (GNU coreutils) 7.4). I tested my shell by running this on the command line
/usr/bin/sort -f -t $'\t' test-file
and it worked. I modified sorter.py to use /usr/bin/sort, restarted galaxy... and still get the same error.
So there is something wrong in the way the shell escapes the $'\t' in the sorter.py line:
Any help greatly appreciated.
I've created a new data library and have added two datasets using the
admin interface to provide file paths to two large-ish files. In the
admin interface they're still showing up as "Information: This job is
queued" but no jobs show up in the admin job list, system load is zero,
and I see no mention of the files in the log past the initial
web-requests that kicked off the process.
I do have "track_jobs_in_database = True" and "enable_job_recovery =
True", and I have previously killed off some data-set-addition jobs that
seemed stalled and _were_ using CPU.
Where should I be looking for blocked data set import jobs? Do load
jobs make it into the database too? Is there I table I can clear out to
remove the import jobs that I interrupted by deleting their library
previously? Is there anything I should be looking for in the logs Is
there a handle I can jiggle?
I'm running rev 2c7acb546d6d.
Ry4an Brase 612-626-6575
University of Minnesota Supercomputing Institute
for Advanced Computational Research http://www.msi.umn.edu
I have installed an instance of Galaxy on our server. I configured it to
run the jobs on our cluster (SGE).
Everything works just fine. I can run NGS jobs that run on our cluster
nodes and return results to Galaxy.
But, sometimes, I have a problem with the job status that is not
updated! It can be flagged as "waiting" and never run, or, even worst,
stay flagged as "running".
When I check the logs I can see that the job has finished normally .
As the step is not flagged as "done" I cannot go further with my
Any help would be welcomed!
Thanks for your great work with Galaxy!
Tanguy LE CARROUR
 Galaxy logs sample:
galaxy.jobs DEBUG 2010-08-25 11:59:45,727 dispatching job 8 to sge
galaxy.jobs.runners.sge DEBUG 2010-08-25 11:59:47,935 (8) submitting
galaxy.jobs.runners.sge DEBUG 2010-08-25 11:59:47,936 (8) command is:
--input1=/srv/GT/www/htdocs/Galaxy/new_galaxy/database/files/000/dataset_9.dat --dbkey=hg19 --ref_file="None" --output1=/srv/GT/www/htdocs/Galaxy/new_galaxy/database/files/000/dataset_10.dat --index_dir=/srv/GT/www/htdocs/Galaxy/new_galaxy/tool-data
galaxy.jobs.runners.sge DEBUG 2010-08-25 11:59:47,941 (8) queued in
ims.q queue as 202466
galaxy.jobs.runners.sge DEBUG 2010-08-25 11:59:48,590 (8/202466) state
change: job is queued and waiting to be scheduled
galaxy.jobs.runners.sge DEBUG 2010-08-25 11:59:54,594 (8/202466) state
change: job is running
galaxy.jobs.runners.sge DEBUG 2010-08-25 12:02:58,979 (8/202466) state
change: process status cannot be determined
galaxy.jobs.runners.sge DEBUG 2010-08-25 12:02:59,979 (8/202466) state
change: job finished normally
Tanguy LE CARROUR
Functional Genomics Center Zurich
ETH Zurich / University of Zurich