Thank you, James, for your reply. I wonder if you could elaborate on why storing the bulk of the data in a relational database seems impractical, or point me to a document where this is discussed at more length.
On 08/31/10, James Taylor <james(a)jamestaylor.org> wrote:
> Hi Yury,
> > we are planning to build a data warehouse for a research center that utilizes multiple high-throughput experimental platforms, e.g. plate-based HTS assays, microarrays of several different types, ChIP-seq, RNA-seq. We have been thinking of managing the data in a relational database. Galaxy looks attractive to us for its workflow management and data provenance features, e.g. to keep track of how raw data are analyzed to produce normalized & summarized datasets and/or final sets of statistics such as p values. We wonder how amenable would Galaxy be to integration with a relational data store.
> > One possible scenario might be to have Galaxy import a dataset from a relational database, run a workflow, then submit the results back to the database with the associated history or link thereto.
> This is certainly a reasonable possibility. You could have a Galaxy tool for submitting data to your database. I would imagine such a tool would produce a Galaxy dataset as output with whatever unique identifier is necessary to recover exactly that data from the database for another analysis.
> > Another possibility is to forgo the relational database altogether and do all our data management within Galaxy.
> I can only give you our experience from inside Galaxy. After initial analysis we made a decision to store all data in Galaxy as files on disk, with metadata (data about data, connections between datasets, workflows, et cetera) in a relational database. We feel this decision has worked well. For the scale of data we see, as well as the wide variety of different data types, a relational database did not, and still does not, seem practical to us.
> -- jt
> James Taylor
> Assistant Professor
> Department of Biology
> Department of Mathematics & Computer Science
> Emory University
Yury V. Bukhman, Ph.D.
Associate Scientist, Bioinformatics
Great Lakes Bioenergy Research Center
University of Wisconsin - Madison
445 Henry Mall, Rm. 513
Madison, WI 53706, USA
Phone: 608-890-2680 Fax: 608-890-2427