workflow intermediate files hog memory
Hi All Intermediate files in a workflow often make up the large majority of a workflow's output and, when this is an NGS analysis, this volume can be HUGE. This is a considerable concern for me as we consider implementing a local install of galaxy. Storing all of this seems useless (once workflow has been worked out) and a huge memory hog if one wants to actually persist the useful final outputs of workflows in galaxy. Is there any way to specify that the output of particular steps in a workflow be deleted (or sent to /tmp) upon successful workflow completion? How are others dealing with this? Is it inadvisable to use galaxy to serve as a repository of results? Thanks Mark This message may contain confidential information. If you are not the designated recipient, please notify the sender immediately, and delete the original and any copies. Any use of the message by you is prohibited.
Hey Mark, Galaxy by default saves everything, as you've noticed. In workflows, you can flag outputs after which intermediate (unflagged) steps will be 'hidden' in the history, but you can't automatically delete them, though this is something we've wanted to do for a while. Unfortunately it requires rewriting the workflow execution model, so it's a larger task. As a stopgap measure, being able to wipe out those 'hidden' datasets in one step would probably be useful. I'd actually thought this was already implemented as an option in the history panel menu, but I don't see it now. I'm creating a Trello card now for adding that method, and there's already one for the deletion of intermediate datasets. -Dannon On Feb 6, 2013, at 7:07 AM, mark.rose@syngenta.com wrote:
Hi All
Intermediate files in a workflow often make up the large majority of a workflow’s output and, when this is an NGS analysis, this volume can be HUGE. This is a considerable concern for me as we consider implementing a local install of galaxy. Storing all of this seems useless (once workflow has been worked out) and a huge memory hog if one wants to actually persist the useful final outputs of workflows in galaxy. Is there any way to specify that the output of particular steps in a workflow be deleted (or sent to /tmp) upon successful workflow completion? How are others dealing with this? Is it inadvisable to use galaxy to serve as a repository of results?
Thanks
Mark
This message may contain confidential information. If you are not the designated recipient, please notify the sender immediately, and delete the original and any copies. Any use of the message by you is prohibited. ___________________________________________________________ Please keep all replies on the list by using "reply all" in your mail client. To manage your subscriptions to this and other Galaxy lists, please use the interface at:
Hi Dannon I'm presuming that wiping hidden files as you suggest would eliminate them entirely so that there would be no "history" of them in the history, which seems less than desirable if you want a record of how the analysis was performed. It seems that it would be better if the steps in the history persist, just remove their output files. Or is there another way of keeping track of this? Thanks for your help Mark -----Original Message----- From: Dannon Baker [mailto:dannonbaker@me.com] Sent: Wednesday, February 06, 2013 8:40 AM To: Rose Mark USRE Cc: galaxy-dev@lists.bx.psu.edu Subject: Re: [galaxy-dev] workflow intermediate files hog memory Hey Mark, Galaxy by default saves everything, as you've noticed. In workflows, you can flag outputs after which intermediate (unflagged) steps will be 'hidden' in the history, but you can't automatically delete them, though this is something we've wanted to do for a while. Unfortunately it requires rewriting the workflow execution model, so it's a larger task. As a stopgap measure, being able to wipe out those 'hidden' datasets in one step would probably be useful. I'd actually thought this was already implemented as an option in the history panel menu, but I don't see it now. I'm creating a Trello card now for adding that method, and there's already one for the deletion of intermediate datasets. -Dannon On Feb 6, 2013, at 7:07 AM, mark.rose@syngenta.com wrote:
Hi All
Intermediate files in a workflow often make up the large majority of a workflow's output and, when this is an NGS analysis, this volume can be HUGE. This is a considerable concern for me as we consider implementing a local install of galaxy. Storing all of this seems useless (once workflow has been worked out) and a huge memory hog if one wants to actually persist the useful final outputs of workflows in galaxy. Is there any way to specify that the output of particular steps in a workflow be deleted (or sent to /tmp) upon successful workflow completion? How are others dealing with this? Is it inadvisable to use galaxy to serve as a repository of results?
Thanks
Mark
This message may contain confidential information. If you are not the designated recipient, please notify the sender immediately, and delete the original and any copies. Any use of the message by you is prohibited. ___________________________________________________________ Please keep all replies on the list by using "reply all" in your mail client. To manage your subscriptions to this and other Galaxy lists, please use the interface at:
This message may contain confidential information. If you are not the designated recipient, please notify the sender immediately, and delete the original and any copies. Any use of the message by you is prohibited.
participants (2)
-
Dannon Baker
-
mark.rose@syngenta.com