-------- Original Message --------
Subject: System built on Galaxy
Date: Wed, 12 Jan 2011 13:29:11 +0000
From: Andy Brown <setitesuk(a)gmail.com>
To: David Jackson <dj3(a)sanger.ac.uk>, Marina Gourtovaia
<mg8(a)sanger.ac.uk>, Guoying Qi <gq1(a)sanger.ac.uk>, svvd(a)sanger.ac.uk,
John O'Brien <jo3(a)sanger.ac.uk>
Just saw this web post about a system built on top of galaxy. It does
the lot from sample/project creation through tracking, launching the
analysis pipeline and returning results.
The Wellcome Trust Sanger Institute is operated by Genome Research
Limited, a charity registered in England with number 1021457 and a
company registered in England with number 2742969, whose registered
office is 215 Euston Road, London, NW1 2BE.
Happy new year to you!
First of all, well done on a fabulous system. It really is going to make my life as a bioinformatician a lot easier and hopefully empower my wet-lab biologists.
I'm trying to setup the 'Get microbial data' tool. Is there an easy way to get hold of these datasets easily with the location files and tool configuration files preset. I wouldn't like to have to try to setup thousands of genomes manually. Better still, can I set up my galaxy instance to send the request to your server directly?
Dr Konrad Paszkiewicz
Exeter Sequencing Service,
University of Exeter,
Exeter EX4 4QD, UK.
I remember a post a while ago about the fetch closest feature tool
requesting that the output returns overlapping features rather than
just the nearest up/downstream feature.
Was this ever implemented? Perhaps as a different tool? Would the
easiest alternative be to create a workflow from existing tools
(join-> filter out intervals that have a join -> fetch closest feature
-> concatenate output files)?
Thanks for your help
The University of Edinburgh is a charitable body, registered in
Scotland, with registration number SC005336.
Currently lib/galaxy/config.py requires that custom job runners be
paired with exactly one egg for fetching, this seems less than ideal
since runners may need 0 eggs (as is the case for a runner I am
implementing) or multiple eggs. Attached is a minor patch that addresses
this. Please consider it for addition to galaxy-central.
Thanks for your time,
University of Minnesota Supercomputing Institute
I'm trying to set up a mechanism to copy a local file using a setuid executable, based on whether the registered user has read access to the local file in question. Is there any means for a tool to know the registered user who is running it?
Helix Systems Staff
Slightly off-topic, but for a new Galaxy installation -
What would you recommend for a free (FOSS) Grid/Cluster management product ?
SGE is not free any more (Oracle Grid Engine is "90-days evaluation, binaries only" free, starting 6.2u6).
Ease of administration is top-priority, more than sophisticated scheduling policies (and of course it has to be well supported by Galaxy).
Any suggestions ?
I've got a user request for a converter from bowtie's map output to BED
format, and looking at the provided script it's mostly just an
application of cut(1) and sort(1).
Is this something Galaxy already does through some mechanism we're not
finding or is this 3 line conversion script something I should be
adding and submitting back?
Ry4an Brase 612-626-6575
Software Developer Application Development
University of Minnesota Supercomputing Institute http://www.msi.umn.edu
On Thu, Jan 6, 2011 at 7:52 AM, Bossers, Alex <Alex.Bossers(a)wur.nl> wrote:
> If the software has this option its no problem to use them!
> Have a look at Peter's last blast+ wrappers. Blast+ of ncbi has
> the ability to specify a number of cores to use...and so can you
> by configuring it in a tool config. Regarding the parallelisation..
> no expert in this. Have a look in the tool shed for the signalp
> and TMHMM wrappers. There you find a piece of python to
> split large jobs in batches, process them in parallel and merge
> them back.
> No experience with cluster or grid jobs myself...
Hi Yury (& Alex),
For a little clarification, like many computationally intensive command
line tools the NCBI BLAST+ tools have a switch for the number of
processors. Currently (like most of the other Galaxy wrappers) this
is specified in the XML wrappers, in this case hard coded at 8. Some
of the other tools XML files are hard coded with 4 threads (e.g. bwa).
In the case of TMHMM and SignalP, the tools themselves are single
threaded but I wrote a wrapper script (in Python) which divides the
input FASTA file into chunks and runs multiple instances of the tool
and then collates the output. Again, my wrapper tools is told how
many threads to use via the XML wrapper. You can find my Galaxy
wrappers for TMHMM and SignalP here at the "Galaxy Community
Tool Shed" (Alex has been testing them - thanks!):
Some of the provided Galaxy wrappers have a note in the XML
saying the number of threads should be configurable, perhaps
via a loc file. I have suggested to the Galaxy developers there
should be a general setting for number of threads per tool
accessible via the XML, so that this can be configured centrally
(maybe I should file an enhancement issue for this):
(I've CC'd the galaxy-dev list, since this discussion is heading in
Something I want to do in several of my workflows is to filter a
FASTA file (or potentially other format sequence files) using a
list of desired identifiers (e.g. a column from a tabular file).
Right now I can achieve this with three steps in Galaxy.
Suppose I have:
Dataset #1, FASTA file
Dataset #2, Tabular file with identifiers of interest (e.g. BLAST hits,
or filtered output from a sequence analysis tool)
Create tabular Dataset #3 using FASTA-to-tabular on Dataset #1,
subject to the enhancement proposed here:
Create tabular Dataset #4 using join on Datasets #2 and #3 using the
matched identifier columns. This does the filtering.
Create FASTA Dataset #5 using tabular-to-FASTA on Dataset #4.
This works (at least for reasonably sized datasets), but requires
three steps and the creation of at least two temporary files.
I'd like to introduce another tool under "FASTA manipulation"
to do it on one step (rather than three). Am I going against
the apparent Galaxy ideal that complex manipulations should
be done with tabular files? Would such a FASTA filter tool be
of interest to add directly to Galaxy (e.g. under the "FASTA
manipulation" section), or better off on the community tool shed?
Here is my current implementation for discussion/consideration: