Thanks Peter. I see, <parallelism> works on a single large file by splitting it and using multiple instances to process the bits in parallel. In our case we use 'composite' data type, simply an array of input files and we would like to process them in parallel, instead of having a 'foreach' loop in the tool wrapper. Is it possible? We are looking at CloudMan for creating a cluster in Galaxy now. -Alex -----Original Message----- From: Peter Cock [mailto:p.j.a.cock@googlemail.com] Sent: Thursday, 7 February 2013 9:09 PM To: Khassapov, Alex (CSIRO IM&T, Clayton) Cc: galaxy-dev@lists.bx.psu.edu Subject: Re: [galaxy-dev] card 79: Split large jobs over multiple nodes for processing On Wed, Feb 6, 2013 at 11:43 PM, <Alex.Khassapov@csiro.au> wrote:
Hi All,
Can anybody please add a few words on how can we use the "initial implementation" which " exists in the tasks framework"?
-Alex
To enable this, set use_tasked_jobs = True in your universe_wsgi.ini file. The tools must also be configured to allow this via the <parallelism> tag. Many of my tools do this, for example see the NCBI BLAST+ wrappers in the tool shed. Additionally the data file formats must support being split, or being merged - which is done via Python code in the Galaxy datatype definition (see the split and merge methods in lib/galaxy/datatypes/*.py). Some other relevant Python code is in lib/galaxy/jobs/splitters/*.py Peter