Re: [galaxy-dev] card 79: Split large jobs over multiple nodes for processing
Hi All, Can anybody please add a few words on how can we use the “initial implementation” which “ exists in the tasks framework”? -Alex From: Trello [mailto:do-not-reply@trello.com] Sent: Wednesday, 6 February 2013 10:58 AM To: Khassapov, Alex (CSIRO IM&T, Clayton) Subject: 4 new notifications on the board Galaxy: Development since 5:56 PM (Tuesday) [https://trello.com/images/logo-s.png] Notifications ________________________________ On Galaxy: Development<https://trello.com/board/galaxy-development/506338ce32ae458f6d15e4b3> [https://trello-avatars.s3.amazonaws.com/a6e93a63989ab71cd87ade0165a04b08/30.png]James Taylor added [https://trello-avatars.s3.amazonaws.com/d0f1bba8eb293d305140421271c383a9/30....] Dannon Baker to the card 79: Split large jobs over multiple nodes for processing<https://trello.com/card/79-split-large-jobs-over-multiple-nodes-for-processing/506338ce32ae458f6d15e4b3/411> on Galaxy: Development<https://trello.com/board/galaxy-development/506338ce32ae458f6d15e4b3> [https://trello-avatars.s3.amazonaws.com/a6e93a63989ab71cd87ade0165a04b08/30.png]James Taylor commented on the card 79: Split large jobs over multiple nodes for processing<https://trello.com/card/79-split-large-jobs-over-multiple-nodes-for-processing/506338ce32ae458f6d15e4b3/411> on Galaxy: Development<https://trello.com/board/galaxy-development/506338ce32ae458f6d15e4b3> An initial implementation exists in the tasks framework. [https://trello-avatars.s3.amazonaws.com/a6e93a63989ab71cd87ade0165a04b08/30.png]James Taylor moved the card 79: Split large jobs over multiple nodes for processing<https://trello.com/card/79-split-large-jobs-over-multiple-nodes-for-processing/506338ce32ae458f6d15e4b3/411> to Complete on Galaxy: Development<https://trello.com/board/galaxy-development/506338ce32ae458f6d15e4b3> [https://trello-avatars.s3.amazonaws.com/a6e93a63989ab71cd87ade0165a04b08/30.png]James Taylor moved the card 137: allow multiple="true" in input param fields of type data<https://trello.com/card/137-allow-multiple-true-in-input-param-fields-of-type-data/506338ce32ae458f6d15e4b3/292> to Pull Requests / Patches on Galaxy: Development<https://trello.com/board/galaxy-development/506338ce32ae458f6d15e4b3> Change how often you get email on your account page<https://trello.com/my/account>. Follow Trello on Twitter<https://twitter.com/intent/follow?user_id=360831528> and Facebook<https://www.facebook.com/TrelloApp>. Get the Trello app for iPhone<http://itunes.com/apps/trello> or Android<https://play.google.com/store/apps/details?id=com.trello>.
On Wed, Feb 6, 2013 at 11:43 PM, <Alex.Khassapov@csiro.au> wrote:
Hi All,
Can anybody please add a few words on how can we use the “initial implementation” which “ exists in the tasks framework”?
-Alex
To enable this, set use_tasked_jobs = True in your universe_wsgi.ini file. The tools must also be configured to allow this via the <parallelism> tag. Many of my tools do this, for example see the NCBI BLAST+ wrappers in the tool shed. Additionally the data file formats must support being split, or being merged - which is done via Python code in the Galaxy datatype definition (see the split and merge methods in lib/galaxy/datatypes/*.py). Some other relevant Python code is in lib/galaxy/jobs/splitters/*.py Peter
Thanks Peter. I see, <parallelism> works on a single large file by splitting it and using multiple instances to process the bits in parallel. In our case we use 'composite' data type, simply an array of input files and we would like to process them in parallel, instead of having a 'foreach' loop in the tool wrapper. Is it possible? We are looking at CloudMan for creating a cluster in Galaxy now. -Alex -----Original Message----- From: Peter Cock [mailto:p.j.a.cock@googlemail.com] Sent: Thursday, 7 February 2013 9:09 PM To: Khassapov, Alex (CSIRO IM&T, Clayton) Cc: galaxy-dev@lists.bx.psu.edu Subject: Re: [galaxy-dev] card 79: Split large jobs over multiple nodes for processing On Wed, Feb 6, 2013 at 11:43 PM, <Alex.Khassapov@csiro.au> wrote:
Hi All,
Can anybody please add a few words on how can we use the "initial implementation" which " exists in the tasks framework"?
-Alex
To enable this, set use_tasked_jobs = True in your universe_wsgi.ini file. The tools must also be configured to allow this via the <parallelism> tag. Many of my tools do this, for example see the NCBI BLAST+ wrappers in the tool shed. Additionally the data file formats must support being split, or being merged - which is done via Python code in the Galaxy datatype definition (see the split and merge methods in lib/galaxy/datatypes/*.py). Some other relevant Python code is in lib/galaxy/jobs/splitters/*.py Peter
participants (2)
-
Alex.Khassapov@csiro.au
-
Peter Cock