On 12-05-03 9:51 AM, "Peter Cock" <p.j.a.cock@googlemail.com> wrote:
Hello all,
Currently the Galaxy experimental task splitting code allows splitting into N chunks, e.g. 8 parts, ...
Or, into chunks of at most size N (units dependent on the file type, e.g. lines in a tabular file or number of sequences in FASTA/FASTQ), e.g. at most 1000 sequences:
...
I would prefer to be able to set both sizes - in this case tell Galaxy to try to use at least 8 parts, each of at most 1000 sequences. ...
Does this sound sufficiently general? The split code is still rather experimental so I don't expect breaking the API to be a big issue (not many people are using it).
Peter
On Thu, May 3, 2012 at 6:01 PM, Paul Gordon <gordonp@ucalgary.ca> wrote:
+1. This is especially useful for us, with hardware-accelerated algorithms having limits on input size.
I've got this working with FASTA files on our Galaxy at the moment, and touch-wood it is behaving nicely. The code doesn't yet handle splitting other input file formats, which would be required before applying this to the trunk - but some feedback on if the Galaxy team are keen on this direction or not would be appreciated: https://bitbucket.org/peterjc/galaxy-central/changeset/aa98de8effd1 (I'll have to sort out the branches since this is now mixed up with BLAST database work... but that is an aside). Peter