On Mon, Feb 22, 2016 at 7:57 AM, Peter van Heusden <pvh@sanbi.ac.za> wrote:
Hi there
...
4) Currently parallelisation in Galaxy is supported using two mechanisms: collections and dataset splitters/tasks. Are there plans on extending and harmonising Galaxy's parallelisation capabilities?
I'm not sure there is anything formal, but chatting to John and others at GCC2015 we recognised that the split/merge capabilities in the Python datatype classes have a lot of functional overlap between splitting and merging for datasets into collections. https://wiki.galaxyproject.org/Events/GCC2015/BoFs/DataSplittingAndParalleli... One idea we mooted was defining (pseudo) tools for dataset splitting and merging using the existing datatype classes, with similar integration into the framework as the datatype converter tools. i.e. You could in principle merge a collection of text files using the text datatype's merge functionality (which is essentially a cat command). There are a lot of details to think about, particularly for splitting where currently tool wrappers using parallelisation have some control (e.g. split a large FASTA file into chunks of 1000 sequences), which might need to be exposed in any UI for creating a collection from a single file. Peter