On Thu, Feb 16, 2012 at 1:53 PM, Fields, Christopher J <cjfields@illinois.edu> wrote:
Makes sense from my perspective; splits have to be defined based on data type. It could be as low-level as defining a simple iterator per record, then a wrapper that allows a specific chunk-size. The split file creation could almost be abstracted completely away into a common method.
I'm trying to understand exactly how the current code creates the splits, but yes - something like that is what I would expect.
As Peter implies, maybe a simple API for defining a split method would be all that is needed. Might also be useful on any merge step, 'cat'-like merges won't work for every format but would be a suitable default.
Yes, for a lot of file types concatenation is fine. Again, like the splitting, this has to be and is defined at the data type level (which is a heirachy of classes in Galaxy). Peter