New subject: integrating Galaxy with a relational data warehouse?

31 Aug 2010


      Thank you, James, for your reply.  I wonder if you could elaborate on why storing the bulk of the data in a relational database seems impractical, or point me to a document where this is discussed at more length.

Yury


On 08/31/10, James Taylor  <james@jamestaylor.org> wrote:
...
Hi Yury,
...
we are planning to build a data warehouse for a research center that utilizes multiple high-throughput experimental platforms, e.g. plate-based HTS assays, microarrays of several different types, ChIP-seq, RNA-seq.  We have been thinking of managing the data in a relational database.  Galaxy looks attractive to us for its workflow management and data provenance features, e.g. to keep track of how raw data are analyzed to produce normalized & summarized datasets and/or final sets of statistics such as p values.  We wonder how amenable would Galaxy be to integration with a relational data store.
One possible scenario might be to have Galaxy import a dataset from a relational database, run a workflow, then submit the results back to the database with the associated history or link thereto.
This is certainly a reasonable possibility. You could have a Galaxy tool for submitting data to your database. I would imagine such a tool would produce a Galaxy dataset as output with whatever unique identifier is necessary to recover exactly that data from the database for another analysis.
...
Another possibility is to forgo the relational database altogether and do all our data management within Galaxy.
I can only give you our experience from inside Galaxy. After initial analysis we made a decision to store all data in Galaxy as files on disk, with metadata (data about data, connections between datasets, workflows, et cetera) in a relational database. We feel this decision has worked well. For the scale of data we see, as well as the wide variety of different data types, a relational database did not, and still does not, seem practical to us.
-- jt
James Taylor
Assistant Professor
Department of Biology
Department of Mathematics & Computer Science
Emory University
-- 
Yury V. Bukhman, Ph.D.
Associate Scientist, Bioinformatics
Great Lakes Bioenergy Research Center
University of Wisconsin - Madison
445 Henry Mall, Rm. 513
Madison, WI 53706, USA
Phone: 608-890-2680  Fax: 608-890-2427
Email: ybukhman@glbrc.wisc.edu

Re: [galaxy-user] integrating Galaxy with a relational data warehouse?

Yury Bukhman

Hiram Clawson

James Taylor

tags

participants (3)