Stalled data set imports, where should I be looking?
I've created a new data library and have added two datasets using the admin interface to provide file paths to two large-ish files. In the admin interface they're still showing up as "Information: This job is queued" but no jobs show up in the admin job list, system load is zero, and I see no mention of the files in the log past the initial web-requests that kicked off the process. I do have "track_jobs_in_database = True" and "enable_job_recovery = True", and I have previously killed off some data-set-addition jobs that seemed stalled and _were_ using CPU. Where should I be looking for blocked data set import jobs? Do load jobs make it into the database too? Is there I table I can clear out to remove the import jobs that I interrupted by deleting their library previously? Is there anything I should be looking for in the logs Is there a handle I can jiggle? I'm running rev 2c7acb546d6d. Thanks! -- Ry4an Brase 612-626-6575 University of Minnesota Supercomputing Institute for Advanced Computational Research http://www.msi.umn.edu
Ry4an Brase wrote:
I've created a new data library and have added two datasets using the admin interface to provide file paths to two large-ish files. In the admin interface they're still showing up as "Information: This job is queued" but no jobs show up in the admin job list, system load is zero, and I see no mention of the files in the log past the initial web-requests that kicked off the process.
I do have "track_jobs_in_database = True" and "enable_job_recovery = True", and I have previously killed off some data-set-addition jobs that seemed stalled and _were_ using CPU.
Where should I be looking for blocked data set import jobs? Do load jobs make it into the database too? Is there I table I can clear out to remove the import jobs that I interrupted by deleting their library previously? Is there anything I should be looking for in the logs Is there a handle I can jiggle?
Hi Ry4an, Could you find these jobs in the database and see what the value of their 'state' column is? If no job action was ever logged in the server log from which you can determine the job id, you can check job_to_output_library_dataset (unless you have added a lot of library datasets since these two became stuck). Thanks, --nate
I'm running rev 2c7acb546d6d.
Thanks!
On Wed, Aug 25, 2010 at 10:27:37AM -0400, Nate Coraor wrote:
Hi Ry4an,
Could you find these jobs in the database and see what the value of their 'state' column is? If no job action was ever logged in the server log from which you can determine the job id, you can check job_to_output_library_dataset (unless you have added a lot of library datasets since these two became stuck).
I can't find entries in the job table corresponding to the loads. Indeed, the most recent create_time in there is 2010-08-20 (this system is not yet in use) and I kicked these imports off yesterday. This aligns with the lack of entries in the admin job view -- everything is in 'ok' or 'error'. The job_to_output_library_dataset is empty. In the 'dataset' table I do see entries that correspond to the additions and their state is 'queued'. And so are 12,000 other entries from an ill-conceived load that I tried to cancel on the 20th: galaxy=# select count(state) from dataset where state = 'queued'; count ------- 12187 All those datasets were in a library I deleted after I realized that 1.2TB of whatever a researcher happens to have on their external drive shouldn't be bulk imported as a single library. Is there a chance that whatever dequeues import jobs is still trying to chew through those 12,000 entries despite the library having been deleted (the datasets aren't showing up as deleted)? There are no errors in the logs and zero system load, but 12K is a lot of imports to wait in line behind... Could I safely clear out those queued datasets or am I playing with internal-referential-integrity fire at that point? Thanks, -- Ry4an Brase 612-626-6575 University of Minnesota Supercomputing Institute for Advanced Computational Research http://www.msi.umn.edu
Ry4an Brase wrote:
On Wed, Aug 25, 2010 at 10:27:37AM -0400, Nate Coraor wrote:
Hi Ry4an,
Could you find these jobs in the database and see what the value of their 'state' column is? If no job action was ever logged in the server log from which you can determine the job id, you can check job_to_output_library_dataset (unless you have added a lot of library datasets since these two became stuck).
I can't find entries in the job table corresponding to the loads. Indeed, the most recent create_time in there is 2010-08-20 (this system is not yet in use) and I kicked these imports off yesterday. This aligns with the lack of entries in the admin job view -- everything is in 'ok' or 'error'.
The job_to_output_library_dataset is empty.
Okay, so job records were never created for these datasets. I am not sure why, though, it should have been done when the dataset objects themselves were created. Although your situation below may have been related.
In the 'dataset' table I do see entries that correspond to the additions and their state is 'queued'. And so are 12,000 other entries from an ill-conceived load that I tried to cancel on the 20th:
galaxy=# select count(state) from dataset where state = 'queued'; count ------- 12187
All those datasets were in a library I deleted after I realized that 1.2TB of whatever a researcher happens to have on their external drive shouldn't be bulk imported as a single library.
Heh. Indeed.
Is there a chance that whatever dequeues import jobs is still trying to chew through those 12,000 entries despite the library having been deleted (the datasets aren't showing up as deleted)? There are no errors in the logs and zero system load, but 12K is a lot of imports to wait in line behind...
If you just deleted the library, there actually isn't anything that dequeues them. Are any jobs in a non-terminal state? That'd be 'new', 'upload' (although this state is only used for uploads through the browser), 'queued', 'running' or 'setting_metadata' (only used when 'Auto-Detect' is clicked).
Could I safely clear out those queued datasets or am I playing with internal-referential-integrity fire at that point?
I'd update them all to 'discarded' and set the deleted column to True. --nate
Thanks,
On Wed, Aug 25, 2010 at 11:30:33AM -0400, Nate Coraor wrote:
The job_to_output_library_dataset is empty.
Okay, so job records were never created for these datasets. I am not sure why, though, it should have been done when the dataset objects themselves were created. Although your situation below may have been related.
Is there a chance that whatever dequeues import jobs is still trying to chew through those 12,000 entries despite the library having been deleted (the datasets aren't showing up as deleted)? There are no errors in the logs and zero system load, but 12K is a lot of imports to wait in line behind...
If you just deleted the library, there actually isn't anything that dequeues them. Are any jobs in a non-terminal state? That'd be 'new', 'upload' (although this state is only used for uploads through the browser), 'queued', 'running' or 'setting_metadata' (only used when 'Auto-Detect' is clicked).
All 24 jobs are in either ok, error, or deleted.
Could I safely clear out those queued datasets or am I playing with internal-referential-integrity fire at that point?
I'd update them all to 'discarded' and set the deleted column to True.
Done. I bounced the galaxy instance after doing so for good measure. I tried importing a new single (big) fasta file and it's in a similar state. It shows up as queued in the admin interface, but isn't actually running - system load is zero. I did get this error after clicking submit when importing, but since it looked 'view-level' I discounted it. The 'message' field was empty. Error Traceback: View as: Interactive | Text | XML (full) ⇝ TypeError: 'NoneType' object is not iterable URL: http://galaxy.msi.umn.edu/library_common/upload_library_dataset Module weberror.evalexception.middleware:364 in respond view >> app_iter = self.application(environ, detect_start_response) Module paste.debug.prints:98 in __call__ view >> environ, self.app) Module paste.wsgilib:539 in intercept_output view >> app_iter = application(environ, replacement_start_response) Module paste.recursive:80 in __call__ view >> return self.application(environ, start_response) Module galaxy.web.framework.middleware.remoteuser:107 in __call__ view >> return self.app( environ, start_response ) Module paste.httpexceptions:632 in __call__ view >> return self.application(environ, start_response) Module galaxy.web.framework.base:145 in __call__ view >> body = method( trans, **kwargs ) Module galaxy.web.controllers.library_common:978 in upload_library_dataset view >> **kwd ) Module galaxy.web.controllers.library_common:1158 in upload_dataset view >> message=util.sanitize_text( message ), Module galaxy.util:138 in sanitize_text view >> for c in text: TypeError: 'NoneType' object is not iterable could that be resulting in the stuff half-imported? Thanks, -- Ry4an Brase 612-626-6575 University of Minnesota Supercomputing Institute for Advanced Computational Research http://www.msi.umn.edu
Ry4an Brase wrote:
I tried importing a new single (big) fasta file and it's in a similar state. It shows up as queued in the admin interface, but isn't actually running - system load is zero.
I did get this error after clicking submit when importing, but since it looked 'view-level' I discounted it. The 'message' field was empty.
Error Traceback: View as: Interactive | Text | XML (full) ⇝ TypeError: 'NoneType' object is not iterable URL: http://galaxy.msi.umn.edu/library_common/upload_library_dataset Module weberror.evalexception.middleware:364 in respond view >> app_iter = self.application(environ, detect_start_response) Module paste.debug.prints:98 in __call__ view >> environ, self.app) Module paste.wsgilib:539 in intercept_output view >> app_iter = application(environ, replacement_start_response) Module paste.recursive:80 in __call__ view >> return self.application(environ, start_response) Module galaxy.web.framework.middleware.remoteuser:107 in __call__ view >> return self.app( environ, start_response ) Module paste.httpexceptions:632 in __call__ view >> return self.application(environ, start_response) Module galaxy.web.framework.base:145 in __call__ view >> body = method( trans, **kwargs ) Module galaxy.web.controllers.library_common:978 in upload_library_dataset view >> **kwd ) Module galaxy.web.controllers.library_common:1158 in upload_dataset view >> message=util.sanitize_text( message ), Module galaxy.util:138 in sanitize_text view >> for c in text: TypeError: 'NoneType' object is not iterable
could that be resulting in the stuff half-imported?
Sure could, this is an interface error but it's masking a deeper problem. There's an error occurring somewhere higher up, but 'message' is somehow set to None, and then in trying to display the error to you, it fails to sanitize the message text since it's not a string. Are you importing via a server directory or filesystem paths? --nate
Thanks,
On Wed, Aug 25, 2010 at 12:07:16PM -0400, Nate Coraor wrote:
>> message=util.sanitize_text( message ), Module galaxy.util:138 in sanitize_text view >> for c in text: TypeError: 'NoneType' object is not iterable
could that be resulting in the stuff half-imported?
Sure could, this is an interface error but it's masking a deeper problem. There's an error occurring somewhere higher up, but 'message' is somehow set to None, and then in trying to display the error to you, it fails to sanitize the message text since it's not a string.
Are you importing via a server directory or filesystem paths?
It's an import from a filesystem path to a specific file (not a directory). I've tried with the message box containing both the empty string and some text. -- Ry4an Brase 612-626-6575 University of Minnesota Supercomputing Institute for Advanced Computational Research http://www.msi.umn.edu
On Wed, Aug 25, 2010 at 11:10:37AM -0500, Ry4an Brase wrote:
On Wed, Aug 25, 2010 at 12:07:16PM -0400, Nate Coraor wrote:
>> message=util.sanitize_text( message ), Module galaxy.util:138 in sanitize_text view >> for c in text: TypeError: 'NoneType' object is not iterable
could that be resulting in the stuff half-imported?
Sure could, this is an interface error but it's masking a deeper problem. There's an error occurring somewhere higher up, but 'message' is somehow set to None, and then in trying to display the error to you, it fails to sanitize the message text since it's not a string.
Are you importing via a server directory or filesystem paths?
It's an import from a filesystem path to a specific file (not a directory). I've tried with the message box containing both the empty string and some text.
As I'm sure you found the message text box was a red herring, and I was running into HTTP return code + return message errors. In the correct case the statuscode was None instead of 200 in lib/galaxy/web/controllers/library_common.py's get_path_paste_uploaded_datasets. Greg fixed it in on line 1270 in changeset a2680959be31. We have an upgrade scheduled, but for now I changed 'None' to 200 and got it going. Thanks a ton for your help, -- Ry4an Brase 612-626-6575 University of Minnesota Supercomputing Institute for Advanced Computational Research http://www.msi.umn.edu
Ry4an Brase wrote:
As I'm sure you found the message text box was a red herring, and I was running into HTTP return code + return message errors. In the correct case the statuscode was None instead of 200 in lib/galaxy/web/controllers/library_common.py's get_path_paste_uploaded_datasets. Greg fixed it in on line 1270 in changeset a2680959be31.
We have an upgrade scheduled, but for now I changed 'None' to 200 and got it going.
Thanks a ton for your help,
Excellent, and I'm glad you found it since I was looking at the current version of the code and couldn't see how your situation was possible. ;) --nate
participants (2)
-
Nate Coraor
-
Ry4an Brase