Dear Assaf and everybody else,
I can only reinforce what you said: Great work! ... and that I had
similar problems. In particular, when working with workflows that have
say 50 different steps, things can become very confusing. It would
help, if one can define outputs of the workflow and hide all the steps
in the history that are inside the workflow and not related to inputs
Another feature that I would find be very helpful in designing larger
workflows would be if one could use workflows within a larger workflow.
In my case I have set of tasks that have to be repeated using several
different settings within a larger workflow.
I realize that workflows are still in beta and that it might be too
early to ask for such features... but it would be great to see them in
Thanks a lot for your efforts!
On 14.11.2008, at 22:15, Assaf Gordon wrote:
Recently, users (of our local galaxy server) started using
workflows, and are very pleased. However, as workflows get more
complicated, it gets harder to track the input and output of the
I'd like to share an example, to illustrate the problems that we
The workflow (pictured in the attached 'workflow.jpg') takes 4 input
datasets, and produces 4 output datasets.
The first problem is that there's no way to differentiate between
the input datasets (They appear simply as "Step 1: Input dataset",
"Step 2: Input Dataset", etc). Since each dataset has a specific
role, I've had to print the workflow and give the users instructions
as to which dataset (in their history) goes into what dataset. (see
The second problem is that whenever I change something in the
workflow and save it - the order of the dataset change!
So what was once dataset 1, can now be dataset 2,3 or 4.
Users have no way of knowing this... (keen users might notice the
the description of the first tool changed from "Output dataset
'output' from step 2" to "Output dataset' output' from step
4" - but
this is very obscure...).
The third problem is that once the workflow completes, the resulting
dataset have cryptic names such as "Join two queries on Data 10 and
Data 2". Since "Data 10" is "Awk on Data 8" and data-8 is
Annotations on Data 7 and Data 1" and data-7 is "Intersect data 1
and data 6" - it gets a bit hard to know what's going on. (see
For the meantime, I've simply gave written instructions on what each
dataset means (see attached
If I may suggest a feature - it would be great if I could name a
dataset inside the workflow. Instead of naming it "Input dataset" I
could give it a descriptive name, so even if the order of the input
datasets changes, users will know which dataset goes into which input.
Regarding the output dataset names, the 'label' option in the tools'
XML is a good start, but still creates very long, hard-to-understand
Another great feature would be the possibility to add an 'output
for each step in the workflow.
Regardless of the above, I'd like to say (once again) that Galaxy is
a great tool, and workflows are really cool - we have several long
workflows which do wonderful things.
Thanks for reading so far,
galaxy-user mailing list
Gunnar Rätsch http://www.fml.mpg.de/raetsch
Friedrich Miescher Laboratory Gunnar.Raetsch(a)tuebingen.mpg.de
Max Planck Society Tel: (+49) 7071 601 820
Spemannstraße 39, 72076 Tübingen, Germany Fax: (+49) 7071 601 801