On Wed, Nov 13, 2013 at 10:34 AM, Peter Cock <p.j.a.cock@googlemail.com> wrote:
On Tue, Nov 12, 2013 at 7:13 PM, Ben Gift <corn8bit2@gmail.com> wrote:
I'm working with a lot of data on a cluster (condor). If I save all the workflow intermediate data, as Galaxy does by default (and rightfully so), it fills the drives.
How can tell Galaxy to use /tmp/ to store all intermediate data in a workflow, and keep the result?
You can't - for a start /tmp is usually machine specific so the /tmp used by one cluster node is probably not going to be available on the /tmp of the other cluster nodes, and different stages of the workflow are likely to be run on different cluster nodes.
I imagine I'll have to work on how Galaxy handles jobs, but I'm hoping there is something built in for this.
Workflows can mark the output datasets, and the rest are automatically hidden/deleted on successful completion (but kept and visible on request via the history menu).
It might be nice if we could make that more aggressive and actually purge the intermediate files from disk as well?
Ability to have these deleted is not available, but it should be an option. Here is the most relevant Trello card. https://trello.com/c/YfLGkJKe Even this small step will probably require tracking some concept of a running workflow in the database or a message queue, I don't think this is being done currently but I think Dannon is working on the queue piece. Once that is in place, there are still many things that could be done better in arena. Nate has mentioned building functionality into object stores and job planning so that data could be pre-staged where it needs to be ahead of time in a workflow. Along similar lines, one could also imagine implementing/configuring an object store that simply wrote files that are pre-marked for deletion (once implemented) to faster staging/scratch disk on the cluster. Having this advanced planning logic built in are probably prerequistes to allowing the use of named pipes or in memory data files some day. A lot of things to work on and there is a long way to go. I have created a Trello card for this and will link to this thread. But it should probably be spelled out more concretely and broken into multiple cards. https://trello.com/c/dUMOHHmM -John
Peter ___________________________________________________________ Please keep all replies on the list by using "reply all" in your mail client. To manage your subscriptions to this and other Galaxy lists, please use the interface at: http://lists.bx.psu.edu/
To search Galaxy mailing lists use the unified search at: http://galaxyproject.org/search/mailinglists/