Slightly off-topic, but I see you have awk in your workflows. Awk
could work on text, tabular, and other formats but I'd rather not
define a new tool for each input type.
Is there a way to define a tool which accepts more any type of input?
It should ideally preserve the format in the output as well.
2008/11/14 Assaf Gordon <gordon(a)cshl.edu>:
Recently, users (of our local galaxy server) started using workflows, and
are very pleased. However, as workflows get more complicated, it gets harder
to track the input and output of the workflows.
I'd like to share an example, to illustrate the problems that we encounter.
The workflow (pictured in the attached 'workflow.jpg') takes 4 input
datasets, and produces 4 output datasets.
The first problem is that there's no way to differentiate between the input
datasets (They appear simply as "Step 1: Input dataset", "Step 2: Input
Dataset", etc). Since each dataset has a specific role, I've had to print
the workflow and give the users instructions as to which dataset (in their
history) goes into what dataset. (see attached
The second problem is that whenever I change something in the workflow and
save it - the order of the dataset change!
So what was once dataset 1, can now be dataset 2,3 or 4.
Users have no way of knowing this... (keen users might notice the the
description of the first tool changed from "Output dataset 'output' from
step 2" to "Output dataset' output' from step 4" - but this is
The third problem is that once the workflow completes, the resulting dataset
have cryptic names such as "Join two queries on Data 10 and Data 2". Since
"Data 10" is "Awk on Data 8" and data-8 is "Generic Annotations
on Data 7
and Data 1" and data-7 is "Intersect data 1 and data 6" - it gets a bit
to know what's going on. (see attached 'crosstab_history.png').
For the meantime, I've simply gave written instructions on what each dataset
means (see attached 'crosstab_workflow_dataset_explnanations.jpg).
If I may suggest a feature - it would be great if I could name a dataset
inside the workflow. Instead of naming it "Input dataset" I could give it a
descriptive name, so even if the order of the input datasets changes, users
will know which dataset goes into which input.
Regarding the output dataset names, the 'label' option in the tools' XML is
a good start, but still creates very long, hard-to-understand names.
Another great feature would be the possibility to add an 'output label'
for each step in the workflow.
Regardless of the above, I'd like to say (once again) that Galaxy is a great
tool, and workflows are really cool - we have several long workflows which
do wonderful things.
Thanks for reading so far,
galaxy-user mailing list