Tool params passed to python dynamic job runner
Hi List, I've run into a frustration recently with passing values to the dynamic job runner from a tool. Here's what's going on: in ncbi_blastn_wrapper.xml: <param name="pathToModule" type="hidden" value="/abs/path/to/module" /> ... in dynamic_runner.py: ... def blastn_wrapper(job): incoming = dict( [ ( p.name, p.value ) for p in job.parameters ] ) modulePath = incoming["pathToModule"] modulePath.replace("\"", "").strip() dummy = '/abs/path/to/module' equals = (modulePath == dummy) log.debug( '\n~%s~\n~%s~\nsame? %s' % (mypath, dummy, equals) ) ... log: ~"/abs/path/to/module"~ ~/abs/path/to/module~ same? False I have no idea why these "" are persisting, or if that is even the problem. I can't find the modulePath when using os.path.exists(modulePath). What I want to do is optionally (user checks a box) run another python script that splits up the blast queries into a bunch of jobs and submits each job separately - and I was just going to import this other script using sys.append(modulePath) in some poor attempt to keep the system cleaner. If this is a problem with my understanding of python, please enlighten me - if there are other tools that pass around paths or if there's another way to do this in less hacky way, I'm open to advice. Thanks! Carrie Ganote
I am not sure why the quotations marks are there in the first place, but I do see the problem. replace and strip return new strings, they don't modify the existing variable. This is demonstrated below: % python
x = ' "Moo Cow" ' x ' "Moo Cow" ' x.replace("\"", "").strip() 'Moo Cow' x ' "Moo Cow" ' x=x.replace("\"", "").strip() x 'Moo Cow'
You will want to change the line: modulePath.replace("\"", "").strip() to modulePath = modulePath.replace("\"", "").strip() You should also take a look at the Galaxy task splitting framework. It already has the ability to split up blast jobs into multiple tasks. I am not sure the dynamic job runner is what you want for that use case. -John On Wed, Aug 7, 2013 at 12:26 PM, Ganote, Carrie L <cganote@iu.edu> wrote:
Hi List,
I've run into a frustration recently with passing values to the dynamic job runner from a tool. Here's what's going on:
in ncbi_blastn_wrapper.xml: <param name="pathToModule" type="hidden" value="/abs/path/to/module" /> ...
in dynamic_runner.py: ... def blastn_wrapper(job): incoming = dict( [ ( p.name, p.value ) for p in job.parameters ] ) modulePath = incoming["pathToModule"] modulePath.replace("\"", "").strip()
dummy = '/abs/path/to/module' equals = (modulePath == dummy) log.debug( '\n~%s~\n~%s~\nsame? %s' % (mypath, dummy, equals) ) ... log: ~"/abs/path/to/module"~ ~/abs/path/to/module~ same? False
I have no idea why these "" are persisting, or if that is even the problem. I can't find the modulePath when using os.path.exists(modulePath). What I want to do is optionally (user checks a box) run another python script that splits up the blast queries into a bunch of jobs and submits each job separately - and I was just going to import this other script using sys.append(modulePath) in some poor attempt to keep the system cleaner.
If this is a problem with my understanding of python, please enlighten me - if there are other tools that pass around paths or if there's another way to do this in less hacky way, I'm open to advice.
Thanks!
Carrie Ganote
___________________________________________________________ Please keep all replies on the list by using "reply all" in your mail client. To manage your subscriptions to this and other Galaxy lists, please use the interface at: http://lists.bx.psu.edu/
To search Galaxy mailing lists use the unified search at: http://galaxyproject.org/search/mailinglists/
Hi John, That was it. I feel silly. I still have a lot of tooth-cutting to do on python! I saw the parallel tags in the Blast tool and was very intrigued, but couldn't find reference to it in the read-the-docs or on the Galaxy wiki. Perhaps there is some documentation of this that I missed? In our case, the python splitting program is doing this: * Take the blast query * Split the sequences up * For each sequence, submit the query and the command to a queue on a RabbitMQ server (Consumers are set up to listen for queries and then run the jobs). * Write each result to a temp file * When all of the sequence jobs are finished, concat the files back in the correct order and write to the output file Galaxy expects I made a wrapper for this splitter and it works fine on its own. Now I'm trying to add this functionality (run on AMQP) as a user-available option on the Blast tool. So for my dynamic runner, I need to know whether to send the job to DRMAA or to this AMQP python script. Hopefully that makes more sense... Thanks so much for the help! -Carrie ________________________________________ From: jmchilton@gmail.com [jmchilton@gmail.com] on behalf of John Chilton [chilton@msi.umn.edu] Sent: Wednesday, August 07, 2013 3:52 PM To: Ganote, Carrie L Cc: galaxy-dev@lists.bx.psu.edu Subject: Re: [galaxy-dev] Tool params passed to python dynamic job runner I am not sure why the quotations marks are there in the first place, but I do see the problem. replace and strip return new strings, they don't modify the existing variable. This is demonstrated below: % python
x = ' "Moo Cow" ' x ' "Moo Cow" ' x.replace("\"", "").strip() 'Moo Cow' x ' "Moo Cow" ' x=x.replace("\"", "").strip() x 'Moo Cow'
You will want to change the line: modulePath.replace("\"", "").strip() to modulePath = modulePath.replace("\"", "").strip() You should also take a look at the Galaxy task splitting framework. It already has the ability to split up blast jobs into multiple tasks. I am not sure the dynamic job runner is what you want for that use case. -John On Wed, Aug 7, 2013 at 12:26 PM, Ganote, Carrie L <cganote@iu.edu> wrote:
Hi List,
I've run into a frustration recently with passing values to the dynamic job runner from a tool. Here's what's going on:
in ncbi_blastn_wrapper.xml: <param name="pathToModule" type="hidden" value="/abs/path/to/module" /> ...
in dynamic_runner.py: ... def blastn_wrapper(job): incoming = dict( [ ( p.name, p.value ) for p in job.parameters ] ) modulePath = incoming["pathToModule"] modulePath.replace("\"", "").strip()
dummy = '/abs/path/to/module' equals = (modulePath == dummy) log.debug( '\n~%s~\n~%s~\nsame? %s' % (mypath, dummy, equals) ) ... log: ~"/abs/path/to/module"~ ~/abs/path/to/module~ same? False
I have no idea why these "" are persisting, or if that is even the problem. I can't find the modulePath when using os.path.exists(modulePath). What I want to do is optionally (user checks a box) run another python script that splits up the blast queries into a bunch of jobs and submits each job separately - and I was just going to import this other script using sys.append(modulePath) in some poor attempt to keep the system cleaner.
If this is a problem with my understanding of python, please enlighten me - if there are other tools that pass around paths or if there's another way to do this in less hacky way, I'm open to advice.
Thanks!
Carrie Ganote
___________________________________________________________ Please keep all replies on the list by using "reply all" in your mail client. To manage your subscriptions to this and other Galaxy lists, please use the interface at: http://lists.bx.psu.edu/
To search Galaxy mailing lists use the unified search at: http://galaxyproject.org/search/mailinglists/
On Wed, Aug 7, 2013 at 3:33 PM, Ganote, Carrie L <cganote@iu.edu> wrote:
Hi John,
That was it. I feel silly. I still have a lot of tooth-cutting to do on python!
I saw the parallel tags in the Blast tool and was very intrigued, but couldn't find reference to it in the read-the-docs or on the Galaxy wiki. Perhaps there is some documentation of this that I missed?
No I don't think there is documentation unless you count the code base the mailing list archive. I think setting use_tasked_jobs to True in universe_wsgi.ini might be all you need to do to start splitting such blast inputs. I think the parallelism tag in the tool file describes how to split the inputs. # This enables splitting of jobs into tasks, if specified by the particular tool config. # This is a new feature and not recommended for production servers yet. #use_tasked_jobs = False I don't use this functionality (at least not in this fashion) so I don't have a lot of advice. Otherwise, if you have a AMQP thing working you should probably just stick with that sounds like a perfectly good way to go. -John
In our case, the python splitting program is doing this: * Take the blast query * Split the sequences up * For each sequence, submit the query and the command to a queue on a RabbitMQ server (Consumers are set up to listen for queries and then run the jobs). * Write each result to a temp file * When all of the sequence jobs are finished, concat the files back in the correct order and write to the output file Galaxy expects
I made a wrapper for this splitter and it works fine on its own. Now I'm trying to add this functionality (run on AMQP) as a user-available option on the Blast tool. So for my dynamic runner, I need to know whether to send the job to DRMAA or to this AMQP python script. Hopefully that makes more sense...
Thanks so much for the help!
-Carrie ________________________________________ From: jmchilton@gmail.com [jmchilton@gmail.com] on behalf of John Chilton [chilton@msi.umn.edu] Sent: Wednesday, August 07, 2013 3:52 PM To: Ganote, Carrie L Cc: galaxy-dev@lists.bx.psu.edu Subject: Re: [galaxy-dev] Tool params passed to python dynamic job runner
I am not sure why the quotations marks are there in the first place, but I do see the problem. replace and strip return new strings, they don't modify the existing variable. This is demonstrated below:
% python
x = ' "Moo Cow" ' x ' "Moo Cow" ' x.replace("\"", "").strip() 'Moo Cow' x ' "Moo Cow" ' x=x.replace("\"", "").strip() x 'Moo Cow'
You will want to change the line:
modulePath.replace("\"", "").strip()
to
modulePath = modulePath.replace("\"", "").strip()
You should also take a look at the Galaxy task splitting framework. It already has the ability to split up blast jobs into multiple tasks. I am not sure the dynamic job runner is what you want for that use case.
-John
On Wed, Aug 7, 2013 at 12:26 PM, Ganote, Carrie L <cganote@iu.edu> wrote:
Hi List,
I've run into a frustration recently with passing values to the dynamic job runner from a tool. Here's what's going on:
in ncbi_blastn_wrapper.xml: <param name="pathToModule" type="hidden" value="/abs/path/to/module" /> ...
in dynamic_runner.py: ... def blastn_wrapper(job): incoming = dict( [ ( p.name, p.value ) for p in job.parameters ] ) modulePath = incoming["pathToModule"] modulePath.replace("\"", "").strip()
dummy = '/abs/path/to/module' equals = (modulePath == dummy) log.debug( '\n~%s~\n~%s~\nsame? %s' % (mypath, dummy, equals) ) ... log: ~"/abs/path/to/module"~ ~/abs/path/to/module~ same? False
I have no idea why these "" are persisting, or if that is even the problem. I can't find the modulePath when using os.path.exists(modulePath). What I want to do is optionally (user checks a box) run another python script that splits up the blast queries into a bunch of jobs and submits each job separately - and I was just going to import this other script using sys.append(modulePath) in some poor attempt to keep the system cleaner.
If this is a problem with my understanding of python, please enlighten me - if there are other tools that pass around paths or if there's another way to do this in less hacky way, I'm open to advice.
Thanks!
Carrie Ganote
___________________________________________________________ Please keep all replies on the list by using "reply all" in your mail client. To manage your subscriptions to this and other Galaxy lists, please use the interface at: http://lists.bx.psu.edu/
To search Galaxy mailing lists use the unified search at: http://galaxyproject.org/search/mailinglists/
On Thu, Aug 8, 2013 at 2:38 AM, John Chilton <chilton@msi.umn.edu> wrote:
On Wed, Aug 7, 2013 at 3:33 PM, Ganote, Carrie L <cganote@iu.edu> wrote:
Hi John,
That was it. I feel silly. I still have a lot of tooth-cutting to do on python!
I saw the parallel tags in the Blast tool and was very intrigued, but couldn't find reference to it in the read-the-docs or on the Galaxy wiki. Perhaps there is some documentation of this that I missed?
No I don't think there is documentation unless you count the code base the mailing list archive. I think setting use_tasked_jobs to True in universe_wsgi.ini might be all you need to do to start splitting such blast inputs. I think the parallelism tag in the tool file describes how to split the inputs.
Yes, two basic forms - into chunks of a set size (which is what the BLAST tools and my other wrappers, use for FASTA files this is a given number of sequences) or into a target number of parts.
# This enables splitting of jobs into tasks, if specified by the particular tool config. # This is a new feature and not recommended for production servers yet. #use_tasked_jobs = False
I don't use this functionality (at least not in this fashion) so I don't have a lot of advice. Otherwise, if you have a AMQP thing working you should probably just stick with that sounds like a perfectly good way to go.
-John
In our case, the python splitting program is doing this: * Take the blast query * Split the sequences up * For each sequence, submit the query and the command to a queue on a RabbitMQ server (Consumers are set up to listen for queries and then run the jobs). * Write each result to a temp file * When all of the sequence jobs are finished, concat the files back in the correct order and write to the output file Galaxy expects
That's pretty much what the BLAST+ wrappers do already via Galaxy's parallel / task splitting. When your cluster is not under full load, this gives faster processing for individual jobs. The downside is more IO, making the cluster as a whole less productive (if it was normally under high usage). We use use_tasked_jobs = True on our Galaxy instance.
I made a wrapper for this splitter and it works fine on its own. Now I'm trying to add this functionality (run on AMQP) as a user-available option on the Blast tool. So for my dynamic runner, I need to know whether to send the job to DRMAA or to this AMQP python script. Hopefully that makes more sense...
If using the parallel / task splitting as it is doesn't work, I would suggest trying to re-use the Galaxy datatype definition classes and their split/merge methods (which in the case of many formats is non-trivial). For instance, merging XML files needs a bit more care, and this work is done for BLAST XML. But ideally could you integrate AMQP as an alternative cluster backend which can be called instead of DRMAA etc? Regards, Peter
participants (3)
-
Ganote, Carrie L
-
John Chilton
-
Peter Cock