Re: [galaxy-dev] rename output dataset in workflow - input dataset variable

24 Nov 2014

      All -

Certainly this is a very real and important problem.

The devteam hasn't moved on the tagging approach outlined in the dev
thread referenced by Peter and I suspect that is because I the
prevailing thought on the team is that dataset naming is not the most
appropriate abstraction to use to address that (though personally I
would be keen to merge a pull request for the compromise approach I
outlined if someone wants to put it together).

Outside the realm of dataset naming however - the devteam is actively
working on this problem in at least two ways.

 - If one is performing an interactive analysis with a few initial
inputs - showing the structure and connection between datasets in the
history I suspect will be a more robust way to track connections and
inputs throughout an analysis than dataset names. Carl has prototyped
and demonstrated some stuff internally for showing such structures - I
would assume it is coming in a future release.

- If you have many samples - I suspect no approach based around
individual datasets will be sufficient. Dataset collections however
have been designed from the ground up with sample tracking in mind and
I think with very little effort on the part of tool developers users
get a very effective sample tracking. Dataset lists and lists of
paired datasets (say representing replicates, samples, conditions, or
patients, etc...) or more deeply nested data structures (representing
hierarchical combinations of those things) are created with element
identifiers at each level of the hierarchy that are preserved
throughout a complex analysis transparently in a way that names are
not - and with very little effort tool developers can leverage these
at merging steps - to produce reports, etc....
(bit.ly/gcc2014workflows).

I am not claiming the problem has been solved - but I did want to
express that the devteam is working on it very actively and things
will continue to improve in this realm.

Thanks for the comments,
-John

On Fri, Nov 21, 2014 at 4:20 AM, Peter Cock <p.j.a.cock@googlemail.com> wrote:
...
Yes :(
There's been some past discussion of this from a tool developer
perspective, e.g. https://trello.com/c/JnhOEqow and
http://dev.list.galaxyproject.org/Using-input-dataset-names-in-output-datase...
The best individual tool authors can do is something like
"$input.name processed with XXX" or "XXX on $input.name"
which in a long pipeline results in extremely long names with
tools sometimes prefixed and sometimes postfixed. :(
Of course, things get really complicated when a tool has multiple
input files - in some cases the tool author could regard one set of
files as primary and preserve their name/tag only,
Naming things is hard.
Peter
On Wed, Nov 19, 2014 at 8:34 PM, Curtis Hendrickson (Campus)
<curtish@uab.edu> wrote:
...
Brad et al,
I would like second the issue you raise so succinctly. The failure to automatically
track the original sample name throughout the analysis (that and  array selection
of paired end reads) is one of the biggest barriers people face for doing work on
many samples in galaxy. It just gets very confusing unless you spend a lot of time
workarounds (creating workflows to rename things, editing datasets individually,
etc) – especially for non-programmer users, for whom workflows with variables
and API calls are beyond the pale.
Regards,
Curtis
___________________________________________________________
Please keep all replies on the list by using "reply all"
in your mail client.  To manage your subscriptions to this
and other Galaxy lists, please use the interface at:
  https://lists.galaxyproject.org/
To search Galaxy mailing lists use the unified search at:
  http://galaxyproject.org/search/mailinglists/

Re: [galaxy-dev] rename output dataset in workflow - input dataset variable

John Chilton