how to set up a microarray analysis workflow in Galaxy?
Hi, I am new to Galaxy. My co-workers have installed a Galaxy server locally and written wrappers for a couple of command-line programs. Now I would like to implement a microarray data analysis workflow using R and Bioconductor. I imagine, it would have the following steps: 1) A user uploads a set of files that constitute a microarray experiment, e.g. a bunch of Affymetrix CEL files and a spreadsheet of information that describes experimental conditions (to be imported into the phenoData slot of a Bioconductor ExpressionSet object) 2) An R script is run to parse/normalize/summarize the data and generate and ExpressionSet object. This object is stored as a file. 3) Another R script takes the ExpressionSet as its input and does some downstream analysis using appropriate Bioconductor packages I have the following questions. My apologies if they are trivial, as I am a novice here: 1) How do I enable a non-administrative user upload a dataset that consists of multiple files, for the first step of my workflow? 2) Will Galaxy store an .RData file? 3) Are there any other issues that I should know about before I plunge into this? Thanks Yury -- Yury V. Bukhman, Ph.D. Associate Scientist, Bioinformatics Great Lakes Bioenergy Research Center University of Wisconsin - Madison 445 Henry Mall, Rm. 513 Madison, WI 53706, USA Phone: 608-890-2680 Fax: 608-890-2427 Email: ybukhman@glbrc.wisc.edu
Yury, I have some bioconductor based microarray tools under development but they're not ready for prime time yet. If you'd like to take a look, you can find them at http://galaxy.rgenetics.org and the tool source is available for svn checkout from http://rgenetics.org/trac/rgalaxy/browser/trunk/tools/rexpression The exact structure of a BioC expression datatype is a challenge - I currently have it implemented (see lib/galaxy/datatypes/genetics.py) as a composite datatype containing both the expressionset (as an R .rData file) and with the phenodata (redundantly) separately - this was done so the phenodata can be accessed by tools such as the one way anova tool so the user can select the phenodata column for the dichotomous analysis variable. The challenge of managing a collection of cel files and processing them is not yet addressed but would be fairly easy to implement as a new composite datatype and I'd be interested in working on it. Why don't we continue this conversation off the list? Eventually we can circle back when the tools are ready for production use. Anyone else interested in helping out is also welcome to contact me directly? On Thu, Sep 9, 2010 at 4:19 PM, Yury Bukhman <ybukhman@glbrc.wisc.edu> wrote:
Hi,
I am new to Galaxy. My co-workers have installed a Galaxy server locally and written wrappers for a couple of command-line programs. Now I would like to implement a microarray data analysis workflow using R and Bioconductor. I imagine, it would have the following steps:
1) A user uploads a set of files that constitute a microarray experiment, e.g. a bunch of Affymetrix CEL files and a spreadsheet of information that describes experimental conditions (to be imported into the phenoData slot of a Bioconductor ExpressionSet object)
2) An R script is run to parse/normalize/summarize the data and generate and ExpressionSet object. This object is stored as a file.
3) Another R script takes the ExpressionSet as its input and does some downstream analysis using appropriate Bioconductor packages
I have the following questions. My apologies if they are trivial, as I am a novice here:
1) How do I enable a non-administrative user upload a dataset that consists of multiple files, for the first step of my workflow?
2) Will Galaxy store an .RData file?
3) Are there any other issues that I should know about before I plunge into this?
Thanks
Yury
-- Yury V. Bukhman, Ph.D. Associate Scientist, Bioinformatics Great Lakes Bioenergy Research Center University of Wisconsin - Madison 445 Henry Mall, Rm. 513 Madison, WI 53706, USA Phone: 608-890-2680 Fax: 608-890-2427 Email: ybukhman@glbrc.wisc.edu _______________________________________________ galaxy-user mailing list galaxy-user@lists.bx.psu.edu http://lists.bx.psu.edu/listinfo/galaxy-user
-- Ross Lazarus MBBS MPH Associate Professor, Harvard Medical School Director of Bioinformatics, Channing Laboratory 181 Longwood Ave., Boston MA 02115, USA. Tel: +1 617 505 4850
participants (2)
-
Ross
-
Yury Bukhman