Please see some comments below.

On Fri, Dec 10, 2010 at 8:59 PM, <anton@bx.psu.edu> wrote:

---------------------------- Original Message ----------------------------
Subject: Re: [galaxy-dev] Wiki erratum...
From: "Richard Bruskiewich" <r.bruskiewich@irri.org>
Date: Fri, December 10, 2010 7:17 pm
To: "Anton Nekrutenko" <anton@bx.psu.edu>
--------------------------------------------------------------------------

Hi Anton,

The power of open source: many eyes... Glad to be of help.

BTW, thank you for all your tutorial videos... they are excellent. I present
them to my staff as an example of how to empower end users to they can work
more independently.

I am located at the International Rice Research Institute (IRRI;
www.irri.org) in the Philippines where I've been for over a decade working
on rice genomics. Due to recent strategic research restructuring here, I
now have the excuse, after years of senior research management, of simply
being a bioinformatics hacker again. It's both fun and frustrating.

I'm just getting seriously started with Galaxy although I've known about
the platform for some years now. It is a very exciting tool. I can't wait
to put it to good use in our projects here.

In particular, NGS data sets are starting to pour in from many IRRI
projects. Galaxy promises to make the analysis of such data tractable,
documented and efficient.

In fact, in 2011, we may be resequencing up to 10,000 new rice genomes.
Galaxy on the Amazon cloud is a godsend for this, although I'm patiently
awaiting for the AMI to be cloned to the ap-southeast region in Singapore,
where we do most of our computing deployments (since we are in Asia). I've
also been told that the next release will have also the Michael Smith Genome
Sciences Center ABySS assembler included... I'm keen on using that software
within Galaxy.

On that note, a technical question about which I'm curious: does Galaxy
configuration currently allow specific tools to run on specific sized
instances? For example, if I fire up a Galaxy CloudMan cluster with a few
large RAM Amazon instances/nodes, can I specifically request that specific
software components (e.g. assemblers like Abyss) run only, or preferably, on
those high capacity nodes?

Currently, Galaxy Cloud allows a cluster to be composed of multiple types of instances but the selection of which tool runs on which instance is handled by the job manager (i.e., SGE) and thus a specific job cannot be targeted at a specific instance type; we should eventually provide support for this type of functionality. In the mean time, a cluster can be composed of the type of instance that match the current workload type and then the type of instances can be changed as the type of workload changes.

Also, Amazon has so-called "cluster" instances, and now GPU cluster
instances. Again, the same idea applies: can specific tools be told to only
run on such a cluster instance? Further ahead, could Galaxy be configured
to automatically start/stop specific instances only when needed (including
cluster instances)?

Because MPI-type jobs are the only true beneficiaries of the cluster instances, but only a handful of bioinformatics software are actually implemented using MPI and because those instances require a different AMI, we do not currently have support for that type of instances - maybe down the line.

Nonetheless, in the coming new version Galaxy Cloud (currently being tested), the application will be able to automatically scale the size of the cluster based on the current workload.

Thanks for your interest,

Enis

I know... probably forging recklessly ahead here. I hope to have a stronger
computing science staff on board in a few months which may allow me to
explore such topics more proactively, but I'm simply wondering about the
state-of-the-art here.

I hope that once I get more familiar with the platform, that I'll be able to
contribute back more. I'm configuring Galaxy to connect to rice genome data,
and there are some other tools I think might be useful in the platform (for
our work, anyhow) so I'll get them in, then share the configuration files
with the community. Maybe the deeper I dig, the more useful I'll get :-).

Cheers
Richard

--
*Richard Bruskiewich, PhD*
Senior Scientist, Computational and Systems Biology
Applications Team for Computational Genomics
T.T. Chang Genetic Resources Center
International Rice Research Institute

On Fri, Dec 10, 2010 at 9:50 PM, Anton Nekrutenko <anton@bx.psu.edu> wrote:

> Richard:
>
> This beauty was mine. Thanks for pointing this out. It is now fixed.
>
> Thanks,
>
> anton
>
>
> On Dec 9, 2010, at 10:04 PM, Richard Bruskiewich wrote:
>
> Galaxy Colleagues,
>
> I don't know who is maintaining the Galaxy wiki page at
> http://bitbucket.org/galaxy/galaxy-central/wiki/NGSLocalSetup but I
> noticed that the Python script under the Megablast instructions has an
> error: the "defline" operation after the "line.startswith" should be moved
> *after* the if length > 0 statement, otherwise, the defline is reset
> incorrectly before the previous sequence is written out. This results in a
> frameshift in the FASTA header line identifiers (i.e. the current sequence
> gets the next sequence identifier).
>
> I've commented out the erroneous defline below and added the right one:
>
> import sys
>
> length = 0
> defline = ''
> seq = []
>
> for line in sys.stdin :
> line = line.rstrip( '\r\n' )
>
> if line.startswith( '>' ):
> # defline = line.split( "|" )[1] # defline should NOT be here
> if length > 0:
> print ">%s_%s" % ( defline, length )
>
> print "\n".join( seq )
> length = 0
> seq = []
> defline = line.split( "|" )[1] # defline should be here
>
> else:
> seq.append( line )
>
> length += len( line )
>
> print ">%s_%s" % ( defline, length )
> print "\n".join( seq )
>
> While on the topic of this page, perhaps the software versions need to be
> revisited. Megablast has been superseded already by Blast+. Perhaps new
> releases of Galaxy should update this?
>
> BTW, when is the new Galaxy release (cloud man AMI too...) coming out? I
> heard rumors that it was due this week.
>
> Cheers
> Richard Bruskiewich
>
> --
> *Richard Bruskiewich, PhD*
> Senior Scientist, Computational and Systems Biology
> Applications Team for Computational Genomics
> T.T. Chang Genetic Resources Center
> International Rice Research Institute
>
> _______________________________________________
> galaxy-dev mailing list
> galaxy-dev@lists.bx.psu.edu
> http://lists.bx.psu.edu/listinfo/galaxy-dev
>
>
> Anton Nekrutenko
> http://nekrut.bx.psu.edu
> http://usegalaxy.org
>
>
>
>

_______________________________________________
galaxy-lab mailing list
galaxy-lab@lists.bx.psu.edu
http://lists.bx.psu.edu/listinfo/galaxy-lab