In our lab, we've worked on several ChipSeq projects and have developed
dozens of scripts for analyzing the results, including running several peak
finders, several motif discovery tools, various data munging techniques,
some parameter optimization for the above programs, calculating the genomic
distribution of peaks, generating several summary graphs, calculating motif
distributions within peaks, performing gene ontology analysis, etc.
We've been thinking about making all of this into a standalone tool,
possibly a web service, and have been considering Galaxy as a vehicle for
automating the entire process and opening up the tools to a biologist
community. From what I've seen in Galaxy Main and the recent inclusion of
e.g., the MACS wrapper, it seems like the things I've listed would be of
interest to the galaxy community at large.
So I'm looking for feedback and possibly advice. Ideally, we'd like to be
able to run the entire pipeline, look at the results, possibly change a few
parameters in some of the steps (e.g., minimum FDR cutoff) and rerun only
what needs to be rerun. Galaxy workflows are easy to create, but don't seem
to have the flexibility that we're looking for. Perhaps several workflows
tied together would do the job (i.e., have separate workflows for the major
parts of the analysis) which we could tie together (possible in galaxy?)
into one uber-pipeline.
Has work like this already been done? Are there sample workflows that go
beyond just calling peaks? Would the community be interested in the code +
Thanks for the help!
Xie Lab, UC Irvine