best practice for data matrices with metadata
Hi All, I'm working on building some galaxy tools which can be used together in a workflow and one data structure that seems to show up a lot is a data matrix where both the data rows and data columns have annotations which I would like to preserve through the workflow. Here is an example that hopefully make this more concrete: we have X genotyping probes in a microarray that we have run on Y samples which results in a data matrix with X rows and Y columns. Each of the probes (rows) has annotation data like allele (A vs B allele), sequence, SNP ID, etc and each of the columns (samples) has it's own annotation data like strain, date etc... So what do you think is the best way to represent this kind of structure? Does galaxy have a mechanism that allows you to associate files as a logical group which would allow my intensity data and meta data to stay together as a "dataset" without having to be in the same file? Thank you Keith
I am wondering if the biohdf project is a good way to keep the meta data. Cheers Kevin On 17-Feb-2011, at 5:11 AM, Keith Sheppard <keithshep@gmail.com> wrote:
Hi All,
I'm working on building some galaxy tools which can be used together in a workflow and one data structure that seems to show up a lot is a data matrix where both the data rows and data columns have annotations which I would like to preserve through the workflow.
Here is an example that hopefully make this more concrete: we have X genotyping probes in a microarray that we have run on Y samples which results in a data matrix with X rows and Y columns. Each of the probes (rows) has annotation data like allele (A vs B allele), sequence, SNP ID, etc and each of the columns (samples) has it's own annotation data like strain, date etc...
So what do you think is the best way to represent this kind of structure? Does galaxy have a mechanism that allows you to associate files as a logical group which would allow my intensity data and meta data to stay together as a "dataset" without having to be in the same file?
Thank you Keith _______________________________________________ To manage your subscriptions to this and other Galaxy lists, please use the interface at:
This seems really interesting! Just from a quick investigation it looks like it is targeting the kind of problem I want to solve. I wonder if you (or anyone else) have some experience to share with bioHDF in terms of how easy it is to use or how much traction it's gaining in the community? The mail list is very quiet http://mail.hdfgroup.org/pipermail/biohdf_hdfgroup.org/ Thanks, Keith On Wed, Feb 16, 2011 at 4:21 PM, Kevin <aboulia@gmail.com> wrote:
I am wondering if the biohdf project is a good way to keep the meta data.
Cheers Kevin On 17-Feb-2011, at 5:11 AM, Keith Sheppard <keithshep@gmail.com> wrote:
Hi All,
I'm working on building some galaxy tools which can be used together in a workflow and one data structure that seems to show up a lot is a data matrix where both the data rows and data columns have annotations which I would like to preserve through the workflow.
Here is an example that hopefully make this more concrete: we have X genotyping probes in a microarray that we have run on Y samples which results in a data matrix with X rows and Y columns. Each of the probes (rows) has annotation data like allele (A vs B allele), sequence, SNP ID, etc and each of the columns (samples) has it's own annotation data like strain, date etc...
So what do you think is the best way to represent this kind of structure? Does galaxy have a mechanism that allows you to associate files as a logical group which would allow my intensity data and meta data to stay together as a "dataset" without having to be in the same file?
Thank you Keith _______________________________________________ To manage your subscriptions to this and other Galaxy lists, please use the interface at:
-- keithsheppard.name
participants (2)
-
Keith Sheppard
-
Kevin