Hello all, Currently the Galaxy experimental task splitting code allows splitting into N chunks, e.g. 8 parts, with: <parallelism method="multi" split_mode="number_of_parts" split_size="8" ..." /> Or, into chunks of at most size N (units dependent on the file type, e.g. lines in a tabular file or number of sequences in FASTA/FASTQ), e.g. at most 1000 sequences: <parallelism method="multi" split_mode="to_size" split_size="1000" ... /> As an aside I found it confusing that the meaning of the "split_size" attribute depend on the "split_mode" (number of jobs, or size of jobs). I would prefer to be able to set both sizes - in this case tell Galaxy to try to use at least 8 parts, each of at most 1000 sequences. Thus in a BLAST task, initially the split would be (up to) eight ways: 8 queries => 8 jobs each with 1 query 80 queries => 8 jobs each with 10 queries 800 queries => 8 jobs each with 100 queries 8000 queries => 8 jobs each with 1000 queries Then, once the max chunk size comes into play, you'd just get more jobs: 9000 queries => 9 jobs each with 1000 queries 10000 queries => 10 jobs each with 1000 queries 20000 queries => 20 jobs each with 1000 queries etc The appeal of this is it takes advantage of parallelism for small jobs (under 100 queries) and large jobs (1000s of queries), while able to impose a maximum size on each cluster job. The problem is this requires changing the XML tags, and getting rid of the current two modes in favour of this combined one. Perhaps this: <parallelism method="multi" min_jobs="8" max_size="1000" ... /> The jobs threshold isn't strictly a minimum - if you have N < 8 query sequences, you'd just have N jobs of 1 query each. Does this sound sufficiently general? The split code is still rather experimental so I don't expect breaking the API to be a big issue (not many people are using it). Peter