Thank you, James, for your reply. I wonder if you could elaborate on why storing the bulk
of the data in a relational database seems impractical, or point me to a document where
this is discussed at more length.
On 08/31/10, James Taylor <james(a)jamestaylor.org> wrote:
> we are planning to build a data warehouse for a research center that utilizes
multiple high-throughput experimental platforms, e.g. plate-based HTS assays, microarrays
of several different types, ChIP-seq, RNA-seq. We have been thinking of managing the data
in a relational database. Galaxy looks attractive to us for its workflow management and
data provenance features, e.g. to keep track of how raw data are analyzed to produce
normalized & summarized datasets and/or final sets of statistics such as p values. We
wonder how amenable would Galaxy be to integration with a relational data store.
> One possible scenario might be to have Galaxy import a dataset from a relational
database, run a workflow, then submit the results back to the database with the associated
history or link thereto.
This is certainly a reasonable possibility. You could have a Galaxy tool for submitting
data to your database. I would imagine such a tool would produce a Galaxy dataset as
output with whatever unique identifier is necessary to recover exactly that data from the
database for another analysis.
> Another possibility is to forgo the relational database altogether and do all our
data management within Galaxy.
I can only give you our experience from inside Galaxy. After initial analysis we made a
decision to store all data in Galaxy as files on disk, with metadata (data about data,
connections between datasets, workflows, et cetera) in a relational database. We feel this
decision has worked well. For the scale of data we see, as well as the wide variety of
different data types, a relational database did not, and still does not, seem practical to
Department of Biology
Department of Mathematics & Computer Science
Yury V. Bukhman, Ph.D.
Associate Scientist, Bioinformatics
Great Lakes Bioenergy Research Center
University of Wisconsin - Madison
445 Henry Mall, Rm. 513
Madison, WI 53706, USA
Phone: 608-890-2680 Fax: 608-890-2427