More helpful cluster job names
Hello all, We're not planning to try the new "run cluster jobs as user" functionality (at least, not just yet), so for now all our SGE jobs show on the cluster via qstat or any other monitoring tool as belonging to the user "galaxy". Currently the jobs are submitted as shell scripts "galaxy_NNN.sh" where NNN is the Galaxy job number. This is almost no help at all for a cluster administrator to use for monitoring or diagnostics. Given qstat truncates the command to 10 letters anyway, I typically see things like "galaxy_42." for "galaxy_42.sh" so once job numbers exceed three digits it becomes ambiguous. To me there are three things that could be done with the script name which would make it far more instructive in this regard. First, move the job number nearer the start. I'd find just "gNNN.sh" more useful than "galaxy_NNN.sh" simply because of the truncation in qstat. Second, unless configured to "run jobs as the user", then include the user name. I would suggest the Galaxy user's "Public Name" (which Galaxy should ensure avoids potential problem characters in filenames) or perhaps the pre- domain part of their email address? e.g. "gNNN_USER.sh" Third, the Galaxy tool ID (which will contain dashes and underscores, but no spaces etc). e.g. "gNNN_TOOL.sh" Even with truncation, this would make it easy to tell NCBI BLAST jobs apart from TopHat or whatever. Obviously using both, e.g. "gNNN_USER_TOOL.sh" wouldn't work so well with the truncation in qstat, but you should still see enough of the username to make it informative. Does this make sense? It would seem to be a small tweak needed in lib/galaxy/jobs/runners/drmma.py where there is currently also a hard coded path bug: https://bitbucket.org/galaxy/galaxy-central/issue/715/ Regards, Peter
'qstat -f | grep jobname' will give you the full jobname without truncation. Personally, I think the qstat reporting gives too little information. I've written a perl script to parse the output of qstat -f to give a bit more information. So truncation of jobnames for qstat shouldn't be an issue. On Fri, Feb 3, 2012 at 5:40 AM, Peter Cock <p.j.a.cock@googlemail.com>wrote:
Hello all,
We're not planning to try the new "run cluster jobs as user" functionality (at least, not just yet), so for now all our SGE jobs show on the cluster via qstat or any other monitoring tool as belonging to the user "galaxy".
Currently the jobs are submitted as shell scripts "galaxy_NNN.sh" where NNN is the Galaxy job number. This is almost no help at all for a cluster administrator to use for monitoring or diagnostics. Given qstat truncates the command to 10 letters anyway, I typically see things like "galaxy_42." for "galaxy_42.sh" so once job numbers exceed three digits it becomes ambiguous.
To me there are three things that could be done with the script name which would make it far more instructive in this regard.
First, move the job number nearer the start. I'd find just "gNNN.sh" more useful than "galaxy_NNN.sh" simply because of the truncation in qstat.
Second, unless configured to "run jobs as the user", then include the user name. I would suggest the Galaxy user's "Public Name" (which Galaxy should ensure avoids potential problem characters in filenames) or perhaps the pre- domain part of their email address? e.g. "gNNN_USER.sh"
Third, the Galaxy tool ID (which will contain dashes and underscores, but no spaces etc). e.g. "gNNN_TOOL.sh" Even with truncation, this would make it easy to tell NCBI BLAST jobs apart from TopHat or whatever.
Obviously using both, e.g. "gNNN_USER_TOOL.sh" wouldn't work so well with the truncation in qstat, but you should still see enough of the username to make it informative.
Does this make sense? It would seem to be a small tweak needed in lib/galaxy/jobs/runners/drmma.py where there is currently also a hard coded path bug: https://bitbucket.org/galaxy/galaxy-central/issue/715/
Regards,
Peter ___________________________________________________________ Please keep all replies on the list by using "reply all" in your mail client. To manage your subscriptions to this and other Galaxy lists, please use the interface at:
On Fri, Feb 3, 2012 at 11:55 AM, Ryan Golhar <ngsbioinformatics@gmail.com> wrote:
'qstat -f | grep jobname' will give you the full jobname without truncation.
Sadly not on our version of SGE - which appears to be SGE 6.2u5 - you get a bit more information but the job name is still truncated as before.
Personally, I think the qstat reporting gives too little information.
I agree.
I've written a perl script to parse the output of qstat -f to give a bit more information. So truncation of jobnames for qstat shouldn't be an issue.
Maybe you're right that it is possible to interrogate SGE for the full name, but I still think Galaxy's job names could be more informative. Regards, Peter
On Feb 3, 2012, at 7:30 AM, Peter Cock wrote:
On Fri, Feb 3, 2012 at 11:55 AM, Ryan Golhar <ngsbioinformatics@gmail.com> wrote:
'qstat -f | grep jobname' will give you the full jobname without truncation.
Sadly not on our version of SGE - which appears to be SGE 6.2u5 - you get a bit more information but the job name is still truncated as before.
Personally, I think the qstat reporting gives too little information.
I agree.
Try qstat -j <job_id>
I've written a perl script to parse the output of qstat -f to give a bit more information. So truncation of jobnames for qstat shouldn't be an issue.
Maybe you're right that it is possible to interrogate SGE for the full name, but I still think Galaxy's job names could be more informative.
This was implemented for the PBS runner, I've carried over the format from that to the DRMAA runner in e4d1dd3bdd0d. The format is: g<job_id>_<tool_id>_<user_email> --nate
Regards,
Peter
___________________________________________________________ Please keep all replies on the list by using "reply all" in your mail client. To manage your subscriptions to this and other Galaxy lists, please use the interface at:
On Feb 6, 2012, at 1:05 PM, Nate Coraor wrote:
On Feb 3, 2012, at 7:30 AM, Peter Cock wrote:
On Fri, Feb 3, 2012 at 11:55 AM, Ryan Golhar <ngsbioinformatics@gmail.com> wrote:
'qstat -f | grep jobname' will give you the full jobname without truncation.
Sadly not on our version of SGE - which appears to be SGE 6.2u5 - you get a bit more information but the job name is still truncated as before.
Personally, I think the qstat reporting gives too little information.
I agree.
Try qstat -j <job_id>
I've written a perl script to parse the output of qstat -f to give a bit more information. So truncation of jobnames for qstat shouldn't be an issue.
Maybe you're right that it is possible to interrogate SGE for the full name, but I still think Galaxy's job names could be more informative.
This was implemented for the PBS runner, I've carried over the format from that to the DRMAA runner in e4d1dd3bdd0d. The format is:
g<job_id>_<tool_id>_<user_email>
Correction, you'll need 74b6319b38b4 since the only valid characters in a DRMAA job name are alphanumeric or underscores.
--nate
Regards,
Peter
___________________________________________________________ Please keep all replies on the list by using "reply all" in your mail client. To manage your subscriptions to this and other Galaxy lists, please use the interface at:
___________________________________________________________ Please keep all replies on the list by using "reply all" in your mail client. To manage your subscriptions to this and other Galaxy lists, please use the interface at:
On Mon, Feb 6, 2012 at 6:17 PM, Nate Coraor <nate@bx.psu.edu> wrote:
On Feb 6, 2012, at 1:05 PM, Nate Coraor wrote:
On Feb 3, 2012, at 7:30 AM, Peter Cock wrote:
On Fri, Feb 3, 2012 at 11:55 AM, Ryan Golhar <ngsbioinformatics@gmail.com> wrote:
'qstat -f | grep jobname' will give you the full jobname without truncation.
Sadly not on our version of SGE - which appears to be SGE 6.2u5 - you get a bit more information but the job name is still truncated as before.
Personally, I think the qstat reporting gives too little information.
I agree.
Try qstat -j <job_id>
I've written a perl script to parse the output of qstat -f to give a bit more information. So truncation of jobnames for qstat shouldn't be an issue.
Maybe you're right that it is possible to interrogate SGE for the full name, but I still think Galaxy's job names could be more informative.
This was implemented for the PBS runner, I've carried over the format from that to the DRMAA runner in e4d1dd3bdd0d. The format is:
g<job_id>_<tool_id>_<user_email>
Correction, you'll need 74b6319b38b4 since the only valid characters in a DRMAA job name are alphanumeric or underscores.
Thanks Nate - I'll check this out when I'm back in the office. Peter
participants (3)
-
Nate Coraor
-
Peter Cock
-
Ryan Golhar