Re: [galaxy-dev] Plans for workflow & parallelisation work?

22 Feb 2016

      On Mon, Feb 22, 2016 at 7:57 AM, Peter van Heusden <pvh@sanbi.ac.za> wrote:
...
Hi there
...
4) Currently parallelisation in Galaxy is supported using two mechanisms:
collections and dataset splitters/tasks. Are there plans on extending and
harmonising Galaxy's parallelisation capabilities?
I'm not sure there is anything formal, but chatting to John and others
at GCC2015 we recognised that the split/merge capabilities in the
Python datatype classes have a lot of functional overlap between
splitting and merging for datasets into collections.

https://wiki.galaxyproject.org/Events/GCC2015/BoFs/DataSplittingAndParalleli...

One idea we mooted was defining (pseudo) tools for dataset splitting
and merging using the existing datatype classes, with similar integration
into the framework as the datatype converter tools.

i.e. You could in principle merge a collection of text files using the
text datatype's merge functionality (which is essentially a cat
command).

There are a lot of details to think about, particularly for splitting
where currently tool wrappers using parallelisation have some
control (e.g. split a large FASTA file into chunks of 1000 sequences),
which might need to be exposed in any UI for creating a collection
from a single file.

Peter

Re: [galaxy-dev] Plans for workflow & parallelisation work?

Peter Cock