Re: [galaxy-dev] Using input dataset names in output dataset names

11 Nov 2013

      *I had been composing this e-mail for a while so it is a lot awkward
given this mornings responses, but felt it best to just get it out
there rather than continue to bake the ideas :)*

If I were not employed by Penn State, I would say you guys should be
using galaxy-extras - these problems are all solved by multiple file
datasets :), but since I am I am not going to mention that.

I agree with Peter, the "tag" idea is probably a better way to get
around this and probably represents an improvement on HIDs. There are
a lot of open tickets related to things like this so I have picked one
at random and sketched out what I think the path forward should maybe
be.

https://trello.com/c/dQA7Y5vS

As mentioned by James, the problem with Peter's first attached patch
is that after several iterations the name gets bigger and bigger. The
tags patch put together or at least linked to by Bjoern does limits
should limit the size of output names over a workflow right? The down
side is that it is not used by default - tool authors have use it.

So, my vote would be to combine the approaches. Specify this new
labeling attribute (I would call it on_name_tag_string instead of
on_tag_string because tags have other meanings in Galaxy), then
provide a Galaxy configuration option that would use this instead of
on_string by default for all tools (or maybe just replace on_string
with on_name_tag_string) so that tools that explicitly use on_string
would pick up the enhancements as well. Galaxy Main wouldn't have to
change its default, but institutions who deem the name tag more
important could.

-John

On Mon, Nov 11, 2013 at 10:22 AM, Peter Cock <p.j.a.cock@googlemail.com> wrote:
...
On Mon, Nov 11, 2013 at 4:09 PM, James Taylor <james@jamestaylor.org> wrote:
...
I have not tested the patch, just read it, but won't this result in
dataset names like:
"Some operation on data 27 (Some operation on data 26 (Some other
operation on data 25 (...(...(...))))"
Potentially - it depends on how the tools use $on_string.
If the tools added a postscript you'd get:
"Original dataset (as tabular) (filtered) (...)"
Neither is ideal. I'd prefer to see something more like this tag idea:
https://trello.com/c/JnhOEqow
What about my suggestion that for simple format conversion tools
we simply reuse the input dataset's name unchanged (without
text about the conversion)? That seems a good compromise.
...
(avoiding this is why we came up with HIDs in the first place).
I don't like the HIDs - unlike dataset names, the HIDs are not entirely
reproducible - they depend on the order of upload, was it a clear
history, etc.
Peter
___________________________________________________________
Please keep all replies on the list by using "reply all"
in your mail client.  To manage your subscriptions to this
and other Galaxy lists, please use the interface at:
  http://lists.bx.psu.edu/
To search Galaxy mailing lists use the unified search at:
  http://galaxyproject.org/search/mailinglists/

Re: [galaxy-dev] Using input dataset names in output dataset names

John Chilton