Re: [galaxy-dev] Workflow post-actions

1 Jul 2010

      Just in case it wasn't hard enough already, I have a suggestion from
one of our users who makes quite complex workflows if you feel so
inclined;

"so green shows a successful job and red is failed - a nice feature
would be choosing the color of your output file in a workflow i.e. I
could use blue for bed files
and orange for plots
and purple for lists etc"

On Thu, Jul 1, 2010 at 10:10 AM, Dannon Baker <dannonbaker@me.com> wrote:
...
(continued, oops)
Hidden job failures would indeed be harder to identify and track down, and I'll address that in the commit I mentioned earlier.
I like the idea of using tags like $input in the names, I'll look through and see what else might make sense.
More metadata change actions are in the pipeline.
Thanks again for the detailed feedback-
Dannon
Sent from my mobile, please excuse typos!
On Jun 29, 2010, at 7:58 PM, Assaf Gordon <gordon@cshl.edu> wrote:
...
AWESOME!
Truly amazing.
And yet, I have some comments:
Notification emails
===================
1. It seems the notification email is sent immediately when the workflow is submitted, before the job is completed.
I've only tested it a few times, but a for a workflow with 5 steps (each is a shell scripts that does "sleep 30") and the last step is configured with EmailAction - I get the email immediately, before the last step is completed.
2. users don't really need a 6-digits microsecond accuracy in the "completed time".
3. The subject line contains "%s", not replaced with a variable ( lib/galaxy/actions/post.py:77 )
4. The email line says "your job "X"..." but "X" is the history name. This implicitly assumes that users run a single workflow in a history, which is not the case.
A more common use case (in our local instance): users load 5 FASTQ files in a single history, and start 5 workflows on those files (all in the same history).
A friendlier message would be "You workflow 'X' on dataset 'Y' in history 'Z' is completed",
with X being the workflow name, and "Y" being the first dataset used as input in the workflow (if multiple files were used, take just one).
Also, remember that many (most? in my case) users still call their histories "unnamed history". So the history name alone isn't helping.
5. The time reported in the emails (for me) is not the local server time.
This might be an indication for a deeper problem (e.g. my postgres time zone is wrong).
6. Link to the galaxy server:
it's a good idea to say "at Galaxy instance XXX", but if I may ask for more:
instead of just the host name, use the complete "url_for()" link, so that mirrors that use the "prefix" option will contain the full link.
If you add "http://" prefix, most email clients automatically convert this to a clickable link (even in textual, non-html emails), so users will have easier time getting to galaxy.
Example:
===
Your job 'Unnamed history' at Galaxy instance rave.cshl.edu is complete as of 2010-06-29 23:18:53.271963.
===
Will be more helpful as:
===
Your job 'Unnamed history' at Galaxy instance http://rave.cshl.edu/galaxy is complete as of 2010-06-29 23:18:53.271963.
===
To ask for even more:
Construct a link that automatically switches to the relevant history - so users will get to their completed jobs with a single click.
Hidden Dataset
==============
1. There's no way to look at hidden datasets (you've mentioned that you're working on that).
2. Hidden datasets are still counted on the "history list" view - a bit confusing (but I'm not sure I have a good suggestion, because of the next item).
3. Two useability issues with hidden datasets:
A long-running job whose dataset is hidden is confusing.
Imagine a workflow with a paired-end mapping job that is hidden - it could take hours, but the history pane will show nothing: only green datasets (the previous workflow steps) and grey datasets (those that haven't run yet).
A failed hidden job (didn't test it, just thought about it):
If a hidden job fails, its following (unhidden) job will also fail, but it would not be immediately obvious how to get to the origin of the failure.
Combining 2+3,
It might be more useful if all datasets are displayed until the entire workflow is successfully completed, and only then datasets are hidden (some sort of an automatic post-action on the last workflow step).
If any step in the workflow fails (or if the workflow isn't completed yet) - everything is visible (I must admit I don't have a good solution for a workflow with multiple final outputs).
Dataset Rename
==============
Would be great to be able to use templates/variables in the new name (e.g. "$input"), so that each step could contain parameters or the name of it's input. (Remember that my users run the same workflow multiple times in the same history, so a fixed name will create multiple datasets with the same name - a bit confusing).
Metadata changes
================
Would be great to be able to change dbkey/organism as well (in addition/alternative to columns/filetype).
Keep up the good work!
Thanks,
-gordon
_______________________________________________
galaxy-dev mailing list
galaxy-dev@lists.bx.psu.edu
http://lists.bx.psu.edu/listinfo/galaxy-dev

galaxy-dev mailing list
galaxy-dev@lists.bx.psu.edu
http://lists.bx.psu.edu/listinfo/galaxy-dev
-- 
Dennis Gascoigne
0407 639 995
dennis.gascoigne@gmail.com