pbs runner deserializes server names as unicode
Hi there, I've run into an issue with the pbs runner involving pbs server name strings becoming 'unicode' rather than 'str'. The pbs_python lib can't pass unicode strings down to the C library, which results in this failure: https://gist.github.com/anonymous/9352218 The very top line of log output in the above gist is due to a line I added to runners/pbs.py: https://gist.github.com/anonymous/9352101#file-pbs-py-L16 I can reproduce this as follows: 1. Start a job using pbs runner. 2. While the job is still running, restart galaxy. 3. Error now appears in the logs. It seems that this is happening because, when deserializing job.destination_params from the database, the strings are converted to unicode. The jobs in this state are effectively stuck, since galaxy can't connect to check on their status. I'm not sure where the best place to address this would be; thoughts? This is on an up to date galaxy-dist with Torque 2.5.13. Thanks, Eric
Hey Eric, Odd - is MSI still using tool runner URLs in universe_wsgi.ini or have you guys transitioned to job_conf.xml. If you are using a job_conf.xml file - out of curiosity are you explicitly declaring an XML encoding at the top of it? Not that this would be the wrong thing to do - I am just trying to understand why you guys are the first to encounter this bug. Regardless, does this hack get you anywhere? https://gist.github.com/jmchilton/9353654 -John On Tue, Mar 4, 2014 at 12:46 PM, Eric Badger <badger@msi.umn.edu> wrote:
Hi there,
I've run into an issue with the pbs runner involving pbs server name strings becoming 'unicode' rather than 'str'. The pbs_python lib can't pass unicode strings down to the C library, which results in this failure:
https://gist.github.com/anonymous/9352218
The very top line of log output in the above gist is due to a line I added to runners/pbs.py:
https://gist.github.com/anonymous/9352101#file-pbs-py-L16
I can reproduce this as follows:
1. Start a job using pbs runner. 2. While the job is still running, restart galaxy. 3. Error now appears in the logs.
It seems that this is happening because, when deserializing job.destination_params from the database, the strings are converted to unicode. The jobs in this state are effectively stuck, since galaxy can't connect to check on their status.
I'm not sure where the best place to address this would be; thoughts?
This is on an up to date galaxy-dist with Torque 2.5.13.
Thanks, Eric
___________________________________________________________ Please keep all replies on the list by using "reply all" in your mail client. To manage your subscriptions to this and other Galaxy lists, please use the interface at: http://lists.bx.psu.edu/
To search Galaxy mailing lists use the unified search at: http://galaxyproject.org/search/mailinglists/
Hey John, We're using job_conf.xml, which does not specify an encoding. It doesn't seem to be the XML parsing at issue; if you omit the restarting of galaxy step, jobs go through just fine. Only when galaxy is restarted while the job is running (and destination_params must therefore be reloaded) does the issue appear. The provided hack does the trick; I'd done up something similar myself but felt like I might've been going about it the wrong way. I'll forge ahead with it though if it looks to be the right way. I'm getting this, by the way, on a freshly cloned galaxy too, not just on the existing MSI instances. Thanks, Eric On Tue, Mar 04, 2014 at 01:30:00PM -0600, John Chilton wrote:
Hey Eric,
Odd - is MSI still using tool runner URLs in universe_wsgi.ini or have you guys transitioned to job_conf.xml. If you are using a job_conf.xml file - out of curiosity are you explicitly declaring an XML encoding at the top of it? Not that this would be the wrong thing to do - I am just trying to understand why you guys are the first to encounter this bug.
Regardless, does this hack get you anywhere?
https://gist.github.com/jmchilton/9353654
-John
On Tue, Mar 4, 2014 at 12:46 PM, Eric Badger <badger@msi.umn.edu> wrote:
Hi there,
I've run into an issue with the pbs runner involving pbs server name strings becoming 'unicode' rather than 'str'. The pbs_python lib can't pass unicode strings down to the C library, which results in this failure:
https://gist.github.com/anonymous/9352218
The very top line of log output in the above gist is due to a line I added to runners/pbs.py:
https://gist.github.com/anonymous/9352101#file-pbs-py-L16
I can reproduce this as follows:
1. Start a job using pbs runner. 2. While the job is still running, restart galaxy. 3. Error now appears in the logs.
It seems that this is happening because, when deserializing job.destination_params from the database, the strings are converted to unicode. The jobs in this state are effectively stuck, since galaxy can't connect to check on their status.
I'm not sure where the best place to address this would be; thoughts?
This is on an up to date galaxy-dist with Torque 2.5.13.
Thanks, Eric
___________________________________________________________ Please keep all replies on the list by using "reply all" in your mail client. To manage your subscriptions to this and other Galaxy lists, please use the interface at: http://lists.bx.psu.edu/
To search Galaxy mailing lists use the unified search at: http://galaxyproject.org/search/mailinglists/
Eric, Right - it is obviously not an XML encoding issue based on your original e-mail. Sorry about that - still wish I understood why others have not encountered this or when the regression was introduced. Regardless - I have committed this patch to galaxy-central. Let me know if there is anything else I can do. -John On Tue, Mar 4, 2014 at 1:53 PM, Eric Badger <badger@msi.umn.edu> wrote:
Hey John,
We're using job_conf.xml, which does not specify an encoding. It doesn't seem to be the XML parsing at issue; if you omit the restarting of galaxy step, jobs go through just fine. Only when galaxy is restarted while the job is running (and destination_params must therefore be reloaded) does the issue appear.
The provided hack does the trick; I'd done up something similar myself but felt like I might've been going about it the wrong way. I'll forge ahead with it though if it looks to be the right way.
I'm getting this, by the way, on a freshly cloned galaxy too, not just on the existing MSI instances.
Thanks, Eric
On Tue, Mar 04, 2014 at 01:30:00PM -0600, John Chilton wrote:
Hey Eric,
Odd - is MSI still using tool runner URLs in universe_wsgi.ini or have you guys transitioned to job_conf.xml. If you are using a job_conf.xml file - out of curiosity are you explicitly declaring an XML encoding at the top of it? Not that this would be the wrong thing to do - I am just trying to understand why you guys are the first to encounter this bug.
Regardless, does this hack get you anywhere?
https://gist.github.com/jmchilton/9353654
-John
On Tue, Mar 4, 2014 at 12:46 PM, Eric Badger <badger@msi.umn.edu> wrote:
Hi there,
I've run into an issue with the pbs runner involving pbs server name strings becoming 'unicode' rather than 'str'. The pbs_python lib can't pass unicode strings down to the C library, which results in this failure:
https://gist.github.com/anonymous/9352218
The very top line of log output in the above gist is due to a line I added to runners/pbs.py:
https://gist.github.com/anonymous/9352101#file-pbs-py-L16
I can reproduce this as follows:
1. Start a job using pbs runner. 2. While the job is still running, restart galaxy. 3. Error now appears in the logs.
It seems that this is happening because, when deserializing job.destination_params from the database, the strings are converted to unicode. The jobs in this state are effectively stuck, since galaxy can't connect to check on their status.
I'm not sure where the best place to address this would be; thoughts?
This is on an up to date galaxy-dist with Torque 2.5.13.
Thanks, Eric
___________________________________________________________ Please keep all replies on the list by using "reply all" in your mail client. To manage your subscriptions to this and other Galaxy lists, please use the interface at: http://lists.bx.psu.edu/
To search Galaxy mailing lists use the unified search at: http://galaxyproject.org/search/mailinglists/
Brilliant, thanks John. I was poking through history while trying to figure this out and it seems like the conditions to cause this have been around for quite a while; pure chance that no one has noticed it? Quite possible that other runners have this as well (haven't checked); anywhere a 'str' is going through json at some point means a 'unicode' will appear if it needs to be reloaded... Anyway, thanks for pushing in the fix for the pbs runner. Eric On Fri, Mar 07, 2014 at 08:22:16AM -0600, John Chilton wrote:
Eric,
Right - it is obviously not an XML encoding issue based on your original e-mail. Sorry about that - still wish I understood why others have not encountered this or when the regression was introduced. Regardless - I have committed this patch to galaxy-central. Let me know if there is anything else I can do.
-John
On Tue, Mar 4, 2014 at 1:53 PM, Eric Badger <badger@msi.umn.edu> wrote:
Hey John,
We're using job_conf.xml, which does not specify an encoding. It doesn't seem to be the XML parsing at issue; if you omit the restarting of galaxy step, jobs go through just fine. Only when galaxy is restarted while the job is running (and destination_params must therefore be reloaded) does the issue appear.
The provided hack does the trick; I'd done up something similar myself but felt like I might've been going about it the wrong way. I'll forge ahead with it though if it looks to be the right way.
I'm getting this, by the way, on a freshly cloned galaxy too, not just on the existing MSI instances.
Thanks, Eric
On Tue, Mar 04, 2014 at 01:30:00PM -0600, John Chilton wrote:
Hey Eric,
Odd - is MSI still using tool runner URLs in universe_wsgi.ini or have you guys transitioned to job_conf.xml. If you are using a job_conf.xml file - out of curiosity are you explicitly declaring an XML encoding at the top of it? Not that this would be the wrong thing to do - I am just trying to understand why you guys are the first to encounter this bug.
Regardless, does this hack get you anywhere?
https://gist.github.com/jmchilton/9353654
-John
On Tue, Mar 4, 2014 at 12:46 PM, Eric Badger <badger@msi.umn.edu> wrote:
Hi there,
I've run into an issue with the pbs runner involving pbs server name strings becoming 'unicode' rather than 'str'. The pbs_python lib can't pass unicode strings down to the C library, which results in this failure:
https://gist.github.com/anonymous/9352218
The very top line of log output in the above gist is due to a line I added to runners/pbs.py:
https://gist.github.com/anonymous/9352101#file-pbs-py-L16
I can reproduce this as follows:
1. Start a job using pbs runner. 2. While the job is still running, restart galaxy. 3. Error now appears in the logs.
It seems that this is happening because, when deserializing job.destination_params from the database, the strings are converted to unicode. The jobs in this state are effectively stuck, since galaxy can't connect to check on their status.
I'm not sure where the best place to address this would be; thoughts?
This is on an up to date galaxy-dist with Torque 2.5.13.
Thanks, Eric
___________________________________________________________ Please keep all replies on the list by using "reply all" in your mail client. To manage your subscriptions to this and other Galaxy lists, please use the interface at: http://lists.bx.psu.edu/
To search Galaxy mailing lists use the unified search at: http://galaxyproject.org/search/mailinglists/
participants (2)
-
Eric Badger
-
John Chilton