Hi!
In our lab, we've worked on several ChipSeq projects and have developed dozens of scripts for analyzing the results, including running several peak finders, several motif discovery tools, various data munging techniques, some parameter optimization for the above programs, calculating the genomic distribution of peaks, generating several summary graphs, calculating motif distributions within peaks, performing gene ontology analysis, etc.
We've been thinking about making all of this into a standalone tool, possibly a web service, and have been considering Galaxy as a vehicle for automating the entire process and opening up the tools to a biologist community. From what I've seen in Galaxy Main and the recent inclusion of e.g., the MACS wrapper, it seems like the things I've listed would be of interest to the galaxy community at large.
So I'm looking for feedback and possibly advice. Ideally, we'd like to be able to run the entire pipeline, look at the results, possibly change a few parameters in some of the steps (e.g., minimum FDR cutoff) and rerun only what needs to be rerun. Galaxy workflows are easy to create, but don't seem to have the flexibility that we're looking for. Perhaps several workflows tied together would do the job (i.e., have separate workflows for the major parts of the analysis) which we could tie together (possible in galaxy?) into one uber-pipeline.
Has work like this already been done? Are there sample workflows that go beyond just calling peaks? Would the community be interested in the code + wrappers?
Thanks for the help!
--
Jake Biesinger
Graduate Student
Xie Lab, UC Irvine
(949) 231-7587