Re: [galaxy-dev] Splitting large jobs over multiple nodes/CPUs?

16 Feb 2012


      On Thu, Feb 16, 2012 at 1:53 PM, Fields, Christopher J
<cjfields@illinois.edu> wrote:
...
Makes sense from my perspective; splits have to be defined based on
data type.  It could be as low-level as defining a simple iterator per
record, then a wrapper that allows a specific chunk-size.  The split
file creation could almost be abstracted completely away into a
common method.
I'm trying to understand exactly how the current code creates the
splits, but yes - something like that is what I would expect.
...
As Peter implies, maybe a simple API for defining a split method
would be all that is needed.  Might also be useful on any merge
step, 'cat'-like merges won't work for every format but would be
a suitable default.
Yes, for a lot of file types concatenation is fine. Again, like the
splitting, this has to be and is defined at the data type level (which
is a heirachy of classes in Galaxy).

Peter

Re: [galaxy-dev] Splitting large jobs over multiple nodes/CPUs?

Peter Cock