Re: [galaxy-dev] Composite datatype output for Cuffdiff

4 Mar 2013

      Alex,

To reiterate what Jeremy has already said on the mailing list, this is
definitely something we want, and need, for Galaxy.  While this particular
implementation has a lot of good parts, creating these collections as
first-class composite datasets isn't ideal and we'd be stuck supporting
them going forward, forever.

There's a clear plan for implementing this in Trello (
https://trello.com/c/325AXIEr), most of which is straightforward to
implement.  The 'hard' part is really going to be implementing an ideal UI
for dealing with these collections, something which we could do in phases.

What exactly are your concerns with the implementation as set out in the
Trello card?

-Dannon

On Mon, Mar 4, 2013 at 1:32 AM, <Alex.Khassapov@csiro.au> wrote:
...
Yeah John,
This is sad, I don't understand why it is such a problem? If it's already
implemented and used in real projects like ours - then it is needed for the
community.  I don't think we have other options for our requirements, your
multiple file datasets implementation was a real saviour for us.
-Alex
-----Original Message-----
From: jmchilton@gmail.com [mailto:jmchilton@gmail.com] On Behalf Of John
Chilton
Sent: Monday, 4 March 2013 4:42 PM
To: Khassapov, Alex (CSIRO IM&T, Clayton)
Cc: <galaxy-dev@bx.psu.edu>
Subject: Re: [galaxy-dev] Composite datatype output for Cuffdiff
Hi Alex,
Thanks for the comments. The galaxy team has made it clear here and to
me privately that this will NOT be included in the Galaxy main code base. I
hope and am I confident that they will make grouping datasets work,
hopefully even to thousands of files.
I do not believe the two ideas are mutually exclusive and I will be
maintaining a fork of galaxy-central with these additions, I will set this
up this week hopefully. I will do my best to respond to support requests
and make multiple file datasets and composite types in general as robust as
possible, keep up with Galaxy updates, etc....
Obviously, it is risky to let a code base drift so far from galaxy main's
however and you, me, and others who might want to use them will have to
carefully weigh the risks when determining if multiple file datasets are
worth the headache.
Thanks for all your help and inputs. I am sorry this did not turn out
differently, I feel I have really failed here.
-John
...
Hi John,
Are you saying that "composite multiple file dataset" isn't required and
won't be implemented?
We are using your implementation of multifiles dataset ("m:xxx" type)
and hope that eventually it will be pushed into main Galaxy implementation.
As we are using Galaxy for CT reconstruction tools, where input and
output can consist of a couple thousand files, other options are not
feasible, i.e. grouping datasets.
-Alex
-----Original Message-----
From: galaxy-dev-bounces@lists.bx.psu.edu
[mailto:galaxy-dev-bounces@lists.bx.psu.edu] On Behalf Of John Chilton
Sent: Thursday, 28 February 2013 2:06 AM
To: Jeremy Goecks
Cc: Jim Johnson; <galaxy-dev@bx.psu.edu>
Subject: Re: [galaxy-dev] Composite datatype output for Cuffdiff
Hey Jeremy,
I am trying to think about a path forward with this composite multiple
file dataset implementation. It seems there is consensus among the galaxy
team that it shouldn't be included because grouping actual datasets would
be superior. In that light, I am revisiting this e-mail, because depending
on the implementation of what you described multiple file datasets are a
specific case of this concept with some likely uncontroversial enhancements
for the specific case of composite datatypes that are a homogeneous list of
files. Does that make any sense?
If I implemented (i) and (ii) in such a way that the multiple file
dataset stuff flowed out more organically is there any chance than it could
be included in galaxy-central. If no and the implicit datatypes and
On Sun, Mar 3, 2013 at 10:08 PM,  <Alex.Khassapov@csiro.au> wrote:
parallelism stuff would remain no-gos implementing what you described would
still benefit the multiple file datasets implementation, so I still might
do this, would a clean implementation of just what you described be
accepted?
...
Any thoughts you or anyone has on the future direction of composite
datatypes in general would be appreciated?
...
Thanks for your time,
-John
On Fri, Oct 12, 2012 at 2:44 PM, Jeremy Goecks <jeremy.goecks@emory.edu>
wrote:
...
...
Hi Jim,
This is nice and is a path forward for the immediate future.
That said, a couple extensions to Galaxy to better support composite
datatypes would enable cummerbund without the additional tools:
(i) extending the composite datatype to include definition of
individual outputs in the collection;
(ii) extend the history panel to allow usage/selection of (1) the
complete composite set of files or (2) individual items in a
composite datatype
Once (i) is done, (ii) should be straightforward using the new
history panel code.
Of course, the advantage of these extensions is that they'd address
both cummerbund issues as well as other challenges, such as using
output from the barcode splitter.
J.
On Oct 11, 2012, at 6:14 PM, Jim Johnson wrote:
Checking to see if there is any interest in including a parameter
option to select outputs for cuffdiff, potentially including a
composite output and a cummeRbund sqlite database.
Issues:
  cuffdiff produces 21 output files, which is a little unwieldy in a
galaxy history.
  cummeRbund generates its database when given a cuffdiff output
directory, but manually hooking up 21 outputs to the cummerbund_wrapper
is a pain.
I've put demo code in the testtoolshed under the name repository name
cummerbund
http://jjohnson@testtoolshed.g2.bx.psu.edu/repos/jjohnson/cummerbund
This includes new datatypes defined in datatypes_conf.xml and
implemented in
cuffdata.py:
      <!-- html composite dataset with cuffdiff outputs in the extra
files path -->
      <datatype extension="cuffdata"
type="galaxy.datatypes.cuffdata:CuffDiffData"/>
      <!-- cummeRbund SQLite database -->
      <datatype extension="cuffdatadb"
type="galaxy.datatypes.cuffdata:CuffDataDB"/>
The cuffdiff wrapper has a multiple select parameter to choose which
output files to put in the history.
In addition to the 21 cuffdiff outputs, the wrapper can also generate:
  cuffdata - which is a composite HTML output with links to the 21
cuffdiff outputs
  cuffdatadb - which is the cummeRbund SQLite database
I also added utility tools:
  cuffdata_datasets - which will take files from the composite
cuffdata and copy them as datasets into the history
  cuffdata_cummerbund - which generates the cummeRbund cuffdatadb
from the composite cuffdata
I updated the cummerbund_wrapper:
  with tryCatch so that a R error on a plot won't exit the Rscript
  to include a small image of each plot on the html page
  added plots for : dispersion, scatter matrix, MDS, and PCA
Thanks,
JJ
___________________________________________________________
Please keep all replies on the list by using "reply all"
in your mail client.  To manage your subscriptions to this and other
Galaxy lists, please use the interface at:
http://lists.bx.psu.edu/

Please keep all replies on the list by using "reply all"
in your mail client.  To manage your subscriptions to this and other
Galaxy lists, please use the interface at:
http://lists.bx.psu.edu/
___________________________________________________________
Please keep all replies on the list by using "reply all"
in your mail client.  To manage your subscriptions to this
and other Galaxy lists, please use the interface at:
http://lists.bx.psu.edu/

Re: [galaxy-dev] Composite datatype output for Cuffdiff

Dannon Baker