Orphan dataset created by history Export to File (possibly a bug?)
hi guys, I have been trying to clean up some old histories and datasets on our local galaxy. I have spotted one very old dataset file with big size didn't get removed no matter what. I went to check the reports webapp, that dataset isn't on the largest unpurged data files list (despite it has bigger size). Then I moved on to checking the database, no history or library associates with that dataset. Finally, I found its trace in job_to_input_dataset, so I could identified the job id and found that it's the file created by history Export to File. I also find relevant entry in job_export_history_archive. Since such datasets are not associated with any history, library, the documented dataset cleanup method does not work on them. Any suggestion? Regards, Derrick
Cleaning up the datasets manually is the best suggestion for now. We're planning to enhance the clean up scripts to automatically delete history export files soon. Best, J. On Sep 13, 2012, at 10:37 PM, Derrick Lin wrote:
hi guys,
I have been trying to clean up some old histories and datasets on our local galaxy. I have spotted one very old dataset file with big size didn't get removed no matter what.
I went to check the reports webapp, that dataset isn't on the largest unpurged data files list (despite it has bigger size).
Then I moved on to checking the database, no history or library associates with that dataset. Finally, I found its trace in job_to_input_dataset, so I could identified the job id and found that it's the file created by history Export to File.
I also find relevant entry in job_export_history_archive. Since such datasets are not associated with any history, library, the documented dataset cleanup method does not work on them.
Any suggestion?
Regards, Derrick ___________________________________________________________ Please keep all replies on the list by using "reply all" in your mail client. To manage your subscriptions to this and other Galaxy lists, please use the interface at:
Thanks Jeremy, I am sure it's trivial for us to do a manual clean up Cheers, D On Fri, Sep 14, 2012 at 1:05 PM, Jeremy Goecks <jeremy.goecks@emory.edu>wrote:
Cleaning up the datasets manually is the best suggestion for now. We're planning to enhance the clean up scripts to automatically delete history export files soon.
Best, J.
On Sep 13, 2012, at 10:37 PM, Derrick Lin wrote:
hi guys,
I have been trying to clean up some old histories and datasets on our local galaxy. I have spotted one very old dataset file with big size didn't get removed no matter what.
I went to check the reports webapp, that dataset isn't on the largest unpurged data files list (despite it has bigger size).
Then I moved on to checking the database, no history or library associates with that dataset. Finally, I found its trace in job_to_input_dataset, so I could identified the job id and found that it's the file created by history Export to File.
I also find relevant entry in job_export_history_archive. Since such datasets are not associated with any history, library, the documented dataset cleanup method does not work on them.
Any suggestion?
Regards, Derrick ___________________________________________________________ Please keep all replies on the list by using "reply all" in your mail client. To manage your subscriptions to this and other Galaxy lists, please use the interface at:
Hi Derrick, If you're using Postgres, it's now possible to clean these up using galaxy-dist/scripts/cleanup_datasets/pgcleanup.py --nate On Sep 13, 2012, at 11:55 PM, Derrick Lin wrote:
Thanks Jeremy,
I am sure it's trivial for us to do a manual clean up
Cheers, D
On Fri, Sep 14, 2012 at 1:05 PM, Jeremy Goecks <jeremy.goecks@emory.edu> wrote: Cleaning up the datasets manually is the best suggestion for now. We're planning to enhance the clean up scripts to automatically delete history export files soon.
Best, J.
On Sep 13, 2012, at 10:37 PM, Derrick Lin wrote:
hi guys,
I have been trying to clean up some old histories and datasets on our local galaxy. I have spotted one very old dataset file with big size didn't get removed no matter what.
I went to check the reports webapp, that dataset isn't on the largest unpurged data files list (despite it has bigger size).
Then I moved on to checking the database, no history or library associates with that dataset. Finally, I found its trace in job_to_input_dataset, so I could identified the job id and found that it's the file created by history Export to File.
I also find relevant entry in job_export_history_archive. Since such datasets are not associated with any history, library, the documented dataset cleanup method does not work on them.
Any suggestion?
Regards, Derrick ___________________________________________________________ Please keep all replies on the list by using "reply all" in your mail client. To manage your subscriptions to this and other Galaxy lists, please use the interface at:
___________________________________________________________ Please keep all replies on the list by using "reply all" in your mail client. To manage your subscriptions to this and other Galaxy lists, please use the interface at:
On Oct 1, 2012, at 12:16 PM, Nate Coraor <nate@bx.psu.edu> wrote:
Hi Derrick,
If you're using Postgres, it's now possible to clean these up using galaxy-dist/scripts/cleanup_datasets/pgcleanup.py
I hope this is not considered hijacking… I just tried to run the pgcleanup.py script and got this error message. scripts/cleanup_datasets/pgcleanup.py __load_config INFO 2012-10-01 12:47:18,628 Reading config from /mnt/ngswork/galaxy/galaxy-dist/universe_wsgi.ini __connect_db INFO 2012-10-01 12:47:18,637 Connecting to database with URL: postgres:///galaxy?host=/var/run/postgresql&user=galaxy _run ERROR 2012-10-01 12:47:18,722 Unknown action in sequence: _run CRITICAL 2012-10-01 12:47:18,723 Exiting due to previous error(s) <module> ERROR 2012-10-01 12:47:18,723 Caught exception in run sequence: Traceback (most recent call last): File "scripts/cleanup_datasets/pgcleanup.py", line 769, in <module> cleanup._run() File "scripts/cleanup_datasets/pgcleanup.py", line 158, in _run sys.exit(1) SystemExit: 1 I have not yet started digging into this yet… so maybe it only affects me - but I thought this information might be useful to you. brad -- Brad Langhorst langhorst@neb.com 978-380-7564
On Oct 1, 2012, at 12:50 PM, Langhorst, Brad wrote:
On Oct 1, 2012, at 12:16 PM, Nate Coraor <nate@bx.psu.edu> wrote:
Hi Derrick,
If you're using Postgres, it's now possible to clean these up using galaxy-dist/scripts/cleanup_datasets/pgcleanup.py
I hope this is not considered hijacking… I just tried to run the pgcleanup.py script and got this error message.
scripts/cleanup_datasets/pgcleanup.py __load_config INFO 2012-10-01 12:47:18,628 Reading config from /mnt/ngswork/galaxy/galaxy-dist/universe_wsgi.ini __connect_db INFO 2012-10-01 12:47:18,637 Connecting to database with URL: postgres:///galaxy?host=/var/run/postgresql&user=galaxy _run ERROR 2012-10-01 12:47:18,722 Unknown action in sequence: _run CRITICAL 2012-10-01 12:47:18,723 Exiting due to previous error(s) <module> ERROR 2012-10-01 12:47:18,723 Caught exception in run sequence: Traceback (most recent call last): File "scripts/cleanup_datasets/pgcleanup.py", line 769, in <module> cleanup._run() File "scripts/cleanup_datasets/pgcleanup.py", line 158, in _run sys.exit(1) SystemExit: 1
I have not yet started digging into this yet… so maybe it only affects me - but I thought this information might be useful to you.
Whoops. Use --help to get the options list. I'll have it spit that out when you don't specify an action in a commit in just a minute.
brad
-- Brad Langhorst langhorst@neb.com 978-380-7564
___________________________________________________________ Please keep all replies on the list by using "reply all" in your mail client. To manage your subscriptions to this and other Galaxy lists, please use the interface at:
Sorry Nate, We are using MySQL so cannot verify this. We are happy to try once the MySQL alternative's released. Regards, Derrick On Tue, Oct 2, 2012 at 2:16 AM, Nate Coraor <nate@bx.psu.edu> wrote:
Hi Derrick,
If you're using Postgres, it's now possible to clean these up using galaxy-dist/scripts/cleanup_datasets/pgcleanup.py
--nate
On Sep 13, 2012, at 11:55 PM, Derrick Lin wrote:
Thanks Jeremy,
I am sure it's trivial for us to do a manual clean up
Cheers, D
On Fri, Sep 14, 2012 at 1:05 PM, Jeremy Goecks <jeremy.goecks@emory.edu> wrote: Cleaning up the datasets manually is the best suggestion for now. We're planning to enhance the clean up scripts to automatically delete history export files soon.
Best, J.
On Sep 13, 2012, at 10:37 PM, Derrick Lin wrote:
hi guys,
I have been trying to clean up some old histories and datasets on our local galaxy. I have spotted one very old dataset file with big size didn't get removed no matter what.
I went to check the reports webapp, that dataset isn't on the largest unpurged data files list (despite it has bigger size).
Then I moved on to checking the database, no history or library associates with that dataset. Finally, I found its trace in job_to_input_dataset, so I could identified the job id and found that it's the file created by history Export to File.
I also find relevant entry in job_export_history_archive. Since such datasets are not associated with any history, library, the documented dataset cleanup method does not work on them.
Any suggestion?
Regards, Derrick ___________________________________________________________ Please keep all replies on the list by using "reply all" in your mail client. To manage your subscriptions to this and other Galaxy lists, please use the interface at:
___________________________________________________________ Please keep all replies on the list by using "reply all" in your mail client. To manage your subscriptions to this and other Galaxy lists, please use the interface at:
participants (4)
-
Derrick Lin
-
Jeremy Goecks
-
Langhorst, Brad
-
Nate Coraor