Best practices with data on clusters

newer
load balancing and proxy settings...

older
Re: [galaxy-dev] Problem "password...

Cittaro Davide

20 Dec 2011 20 Dec '11

10:04 a.m.

Hi developers, I have a question that may be an OT, but since galaxy can work in a clustered environment withh queueing system, I'll try to ask here. Is there anibody here who copies data in a local temporary directory before performing any analysis step and copy it back into the "final results"? Thanks d Sent from my iPad

Show replies by date

Nate Coraor

3 Jan 3 Jan

9:15 p.m.

On Dec 20, 2011, at 5:04 AM, Cittaro Davide wrote:

...

Hi developers, I have a question that may be an OT, but since galaxy can work in a clustered environment withh queueing system, I'll try to ask here. Is there anibody here who copies data in a local temporary directory before performing any analysis step and copy it back into the "final results"?

Hi Davide, We did this for a while when we had a poorly performing fileserver. It can reduce load in that environment, but in cases where you are only going to read small portions of input files, you'll probably have longer execution time. Likewise if you'll simply be writing the output(s) in one big stream, since you then have to write it once locally and then back over the network. That said, if you have a lot interim steps that produce large data that then get merged via some process back to final outputs, it absolutely makes sense to use local disk for those steps (assuming local disk is large enough - another problem that we sometimes encounter). --nate

...

Thanks

d

Sent from my iPad ___________________________________________________________ Please keep all replies on the list by using "reply all" in your mail client. To manage your subscriptions to this and other Galaxy lists, please use the interface at:

http://lists.bx.psu.edu/

Cittaro Davide

4 Jan 4 Jan

11:03 a.m.

Hi Nate, On Jan 3, 2012, at 10:15 PM, Nate Coraor wrote: That said, if you have a lot interim steps that produce large data that then get merged via some process back to final outputs, it absolutely makes sense to use local disk for those steps (assuming local disk is large enough - another problem that we sometimes encounter). Wouldn't mean that most of the workflows dealing with NGS data should run on local disks? d /* Davide Cittaro, PhD Head of Bioinformatics Core Center for Translational Genomics and Bioinformatics San Raffaele Scientific Institute Via Olgettina 58 20132 Milano Italy Office: +39 02 26439140 Mail: cittaro.davide@hsr.it<mailto:cittaro.davide@hsr.it> Skype: daweonline */

Nate Coraor

4:43 p.m.

On Jan 4, 2012, at 6:03 AM, Cittaro Davide wrote:

...

Hi Nate,

On Jan 3, 2012, at 10:15 PM, Nate Coraor wrote:

...
That said, if you have a lot interim steps that produce large data that then get merged via some process back to final outputs, it absolutely makes sense to use local disk for those steps (assuming local disk is large enough - another problem that we sometimes encounter).

Wouldn't mean that most of the workflows dealing with NGS data should run on local disks?

It depends on the location and ordering of the steps - If you're parallelizing single steps across multiple nodes, it wouldn't make sense. If you run multiple steps serially on a single node, then you could work locally between those steps. --nate

...

d

/* Davide Cittaro, PhD

Head of Bioinformatics Core Center for Translational Genomics and Bioinformatics San Raffaele Scientific Institute Via Olgettina 58 20132 Milano Italy

Office: +39 02 26439140 Mail: cittaro.davide@hsr.it Skype: daweonline */

___________________________________________________________ Please keep all replies on the list by using "reply all" in your mail client. To manage your subscriptions to this and other Galaxy lists, please use the interface at:

http://lists.bx.psu.edu/

4942

Age (days ago)

4957

Last active (days ago)

List overview

Download

3 comments

2 participants

participants (2)

Cittaro Davide
Nate Coraor