We've run into a scenario lately where we need to run a very large workflow (huge data in intermediate steps) many times. We can't do this because Galaxy copies all intermediate steps to all notes, which would bog down the servers too much.

I asked about something similar before and John mentioned the feature to automatically delete intermediate step data in a workflow once it completed, was coming soon. Is that a feature now? That would help.

Ultimately though we can't be copying all this data around to all nodes. The network just isn't good enough, so I have an idea.

What if we have an option on the 'run workflow' screen to only run on one node (eliminating the neat Galaxy concurrency ability for that workflow unfortunately)? Then it just propagates the final step data.

Or maybe only copy to a couple other nodes, to keep concurrency.

If the job errored then in this case I think it should just throw out all the data, or propagate where it stopped.

I've been trying to work on implementing this myself but it's taking me a long time. I only just started understanding the pyramid stack, and am putting in the checkbox in the run.mako template. I still need to learn the database schema, message passing, and how jobs are stored, and how to tell condor to only use 1 node, (and more I'm sure) in Galaxy. (I'm drowning)

This seems like a really important feature though as Galaxy gains more traction as a research tool for bigger projects that demand working with huge data, and running huge workflows many many times.