Hi! In our lab, we've worked on several ChipSeq projects and have developed dozens of scripts for analyzing the results, including running several peak finders, several motif discovery tools, various data munging techniques, some parameter optimization for the above programs, calculating the genomic distribution of peaks, generating several summary graphs, calculating motif distributions within peaks, performing gene ontology analysis, etc. We've been thinking about making all of this into a standalone tool, possibly a web service, and have been considering Galaxy as a vehicle for automating the entire process and opening up the tools to a biologist community. From what I've seen in Galaxy Main and the recent inclusion of e.g., the MACS wrapper, it seems like the things I've listed would be of interest to the galaxy community at large. So I'm looking for feedback and possibly advice. Ideally, we'd like to be able to run the entire pipeline, look at the results, possibly change a few parameters in some of the steps (e.g., minimum FDR cutoff) and rerun only what needs to be rerun. Galaxy workflows are easy to create, but don't seem to have the flexibility that we're looking for. Perhaps several workflows tied together would do the job (i.e., have separate workflows for the major parts of the analysis) which we could tie together (possible in galaxy?) into one uber-pipeline. Has work like this already been done? Are there sample workflows that go beyond just calling peaks? Would the community be interested in the code + wrappers? Thanks for the help! -- Jake Biesinger Graduate Student Xie Lab, UC Irvine (949) 231-7587
Hi! In our lab, we've worked on several ChipSeq projects and have developed dozens of scripts for analyzing the results, including running several peak finders, several motif discovery tools, various data munging techniques, some parameter optimization for the above programs, calculating the genomic distribution of peaks, generating several summary graphs, calculating motif distributions within peaks, performing gene ontology analysis, etc. We've been thinking about making all of this into a standalone tool, possibly a web service, and have been considering Galaxy as a vehicle for automating the entire process and opening up the tools to a biologist community. From what I've seen in Galaxy Main and the recent inclusion of e.g., the MACS wrapper, it seems like the things I've listed would be of interest to the galaxy community at large. So I'm looking for feedback and possibly advice. Ideally, we'd like to be able to run the entire pipeline, look at the results, possibly change a few parameters in some of the steps (e.g., minimum FDR cutoff) and rerun only what needs to be rerun. Galaxy workflows are easy to create, but don't seem to have the flexibility that we're looking for. Perhaps several workflows tied together would do the job (i.e., have separate workflows for the major parts of the analysis) which we could tie together (possible in galaxy?) into one uber-pipeline. Has work like this already been done? Are there sample workflows that go beyond just calling peaks? Would the community be interested in the code + wrappers? Thanks for the help! -- Jake Biesinger Graduate Student Xie Lab, UC Irvine (949) 231-7587
Jake: Thank for your e-mail. There has been work in this domain. Some from Galaxy team but one of most impressive examples is Citrome project at Harvard (http://cistrome.dfci.harvard.edu/ap/), which uses Galaxy as the underlying framework. Our group and the community are very much interested in your code+wrappers. If you already tried to port tool to Galaxy, these can be submitted to our very new community site at http://usegalaxy.org/community Speaking of flexibility in Galaxy workflows we are actively working on improving workflow functionality and if you looked at the workflows recently you might have noticed workflow actions and more is coming. The bottom line -> Galaxy community needs you tools = wrap, test, and submit! Thanks, anton galaxy team On Jul 21, 2010, at 3:51 PM, Jacob Biesinger wrote:
Hi!
In our lab, we've worked on several ChipSeq projects and have developed dozens of scripts for analyzing the results, including running several peak finders, several motif discovery tools, various data munging techniques, some parameter optimization for the above programs, calculating the genomic distribution of peaks, generating several summary graphs, calculating motif distributions within peaks, performing gene ontology analysis, etc.
We've been thinking about making all of this into a standalone tool, possibly a web service, and have been considering Galaxy as a vehicle for automating the entire process and opening up the tools to a biologist community. From what I've seen in Galaxy Main and the recent inclusion of e.g., the MACS wrapper, it seems like the things I've listed would be of interest to the galaxy community at large.
So I'm looking for feedback and possibly advice. Ideally, we'd like to be able to run the entire pipeline, look at the results, possibly change a few parameters in some of the steps (e.g., minimum FDR cutoff) and rerun only what needs to be rerun. Galaxy workflows are easy to create, but don't seem to have the flexibility that we're looking for. Perhaps several workflows tied together would do the job (i.e., have separate workflows for the major parts of the analysis) which we could tie together (possible in galaxy?) into one uber-pipeline.
Has work like this already been done? Are there sample workflows that go beyond just calling peaks? Would the community be interested in the code + wrappers?
Thanks for the help! -- Jake Biesinger Graduate Student Xie Lab, UC Irvine (949) 231-7587
_______________________________________________ galaxy-dev mailing list galaxy-dev@lists.bx.psu.edu http://lists.bx.psu.edu/listinfo/galaxy-dev
Anton Nekrutenko http://nekrut.bx.psu.edu http://usegalaxy.org
I see some great progress on the cistrome project. What a shame that they haven't open-sourced their efforts. We've only just started porting and wrapping our code for Galaxy. One possible limiting factor is that a good portion of our code depends on the pygr package for python in order to extract sequence and perform genomic queries quickly. For the community, would this be too tall of an order to maintain? Thanks again. -- Jake Biesinger Graduate Student Xie Lab, UC Irvine (949) 231-7587 On Wed, Jul 21, 2010 at 1:01 PM, Anton Nekrutenko <anton@bx.psu.edu> wrote:
Jake:
Thank for your e-mail. There has been work in this domain. Some from Galaxy team but one of most impressive examples is Citrome project at Harvard ( http://cistrome.dfci.harvard.edu/ap/), which uses Galaxy as the underlying framework. Our group and the community are very much interested in your code+wrappers. If you already tried to port tool to Galaxy, these can be submitted to our very new community site at http://usegalaxy.org/community
Speaking of flexibility in Galaxy workflows we are actively working on improving workflow functionality and if you looked at the workflows recently you might have noticed workflow actions and more is coming.
The bottom line -> Galaxy community needs you tools = wrap, test, and submit!
Thanks,
anton galaxy team
On Jul 21, 2010, at 3:51 PM, Jacob Biesinger wrote:
Hi!
In our lab, we've worked on several ChipSeq projects and have developed dozens of scripts for analyzing the results, including running several peak finders, several motif discovery tools, various data munging techniques, some parameter optimization for the above programs, calculating the genomic distribution of peaks, generating several summary graphs, calculating motif distributions within peaks, performing gene ontology analysis, etc.
We've been thinking about making all of this into a standalone tool, possibly a web service, and have been considering Galaxy as a vehicle for automating the entire process and opening up the tools to a biologist community. From what I've seen in Galaxy Main and the recent inclusion of e.g., the MACS wrapper, it seems like the things I've listed would be of interest to the galaxy community at large.
So I'm looking for feedback and possibly advice. Ideally, we'd like to be able to run the entire pipeline, look at the results, possibly change a few parameters in some of the steps (e.g., minimum FDR cutoff) and rerun only what needs to be rerun. Galaxy workflows are easy to create, but don't seem to have the flexibility that we're looking for. Perhaps several workflows tied together would do the job (i.e., have separate workflows for the major parts of the analysis) which we could tie together (possible in galaxy?) into one uber-pipeline.
Has work like this already been done? Are there sample workflows that go beyond just calling peaks? Would the community be interested in the code + wrappers?
Thanks for the help! -- Jake Biesinger Graduate Student Xie Lab, UC Irvine (949) 231-7587
_______________________________________________ galaxy-dev mailing list galaxy-dev@lists.bx.psu.edu http://lists.bx.psu.edu/listinfo/galaxy-dev
Anton Nekrutenko http://nekrut.bx.psu.edu http://usegalaxy.org
Jacob Biesinger wrote:
I see some great progress on the cistrome project. What a shame that they haven't open-sourced their efforts.
We've only just started porting and wrapping our code for Galaxy. One possible limiting factor is that a good portion of our code depends on the pygr package for python in order to extract sequence and perform genomic queries quickly. For the community, would this be too tall of an order to maintain?
Hi Jake, We have quite a few tools that depend on outside Python modules, although none on pygr. Regardless, unless it's exceptionally difficult to install, this will be fine. --nate
Thanks again. -- Jake Biesinger Graduate Student Xie Lab, UC Irvine (949) 231-7587
On Wed, Jul 21, 2010 at 1:01 PM, Anton Nekrutenko <anton@bx.psu.edu <mailto:anton@bx.psu.edu>> wrote:
Jake:
Thank for your e-mail. There has been work in this domain. Some from Galaxy team but one of most impressive examples is Citrome project at Harvard (http://cistrome.dfci.harvard.edu/ap/), which uses Galaxy as the underlying framework. Our group and the community are very much interested in your code+wrappers. If you already tried to port tool to Galaxy, these can be submitted to our very new community site at http://usegalaxy.org/community
Speaking of flexibility in Galaxy workflows we are actively working on improving workflow functionality and if you looked at the workflows recently you might have noticed workflow actions and more is coming.
The bottom line -> Galaxy community needs you tools = wrap, test, and submit!
Thanks,
anton galaxy team
On Jul 21, 2010, at 3:51 PM, Jacob Biesinger wrote:
> Hi! > > In our lab, we've worked on several ChipSeq projects and have developed dozens of scripts for analyzing the results, including running several peak finders, several motif discovery tools, various data munging techniques, some parameter optimization for the above programs, calculating the genomic distribution of peaks, generating several summary graphs, calculating motif distributions within peaks, performing gene ontology analysis, etc. > > We've been thinking about making all of this into a standalone tool, possibly a web service, and have been considering Galaxy as a vehicle for automating the entire process and opening up the tools to a biologist community. From what I've seen in Galaxy Main and the recent inclusion of e.g., the MACS wrapper, it seems like the things I've listed would be of interest to the galaxy community at large. > > So I'm looking for feedback and possibly advice. Ideally, we'd like to be able to run the entire pipeline, look at the results, possibly change a few parameters in some of the steps (e.g., minimum FDR cutoff) and rerun only what needs to be rerun. Galaxy workflows are easy to create, but don't seem to have the flexibility that we're looking for. Perhaps several workflows tied together would do the job (i.e., have separate workflows for the major parts of the analysis) which we could tie together (possible in galaxy?) into one uber-pipeline. > > Has work like this already been done? Are there sample workflows that go beyond just calling peaks? Would the community be interested in the code + wrappers? > > Thanks for the help! > -- > Jake Biesinger > Graduate Student > Xie Lab, UC Irvine > (949) 231-7587 > > _______________________________________________ > galaxy-dev mailing list > galaxy-dev@lists.bx.psu.edu <mailto:galaxy-dev@lists.bx.psu.edu> > http://lists.bx.psu.edu/listinfo/galaxy-dev
Anton Nekrutenko http://nekrut.bx.psu.edu http://usegalaxy.org
------------------------------------------------------------------------
_______________________________________________ galaxy-dev mailing list galaxy-dev@lists.bx.psu.edu http://lists.bx.psu.edu/listinfo/galaxy-dev
Hi Jake,
So I'm looking for feedback and possibly advice. Ideally, we'd like to be able to run the entire pipeline, look at the results, possibly change a few parameters in some of the steps (e.g., minimum FDR cutoff) and rerun only what needs to be rerun. Galaxy workflows are easy to create, but don't seem to have the flexibility that we're looking for. Perhaps several workflows tied together would do the job (i.e., have separate workflows for the major parts of the analysis) which we could tie together (possible in galaxy?) into one uber-pipeline.
I'm a Galaxy developer who also uses Galaxy for NGS analyses; here are my opinions about workflows. First, separate workflows for major or time-consuming aspects of an analysis work well. Galaxy provides the ability to copy (clone) workflows, and I often copy a workflow and then add to it so that I have the simpler workflow and also the more complex workflow. This enables me to run either the simpler or the complex workflow. Often, I run the complex analysis initially and use the simpler workflows to rerun particular aspects of the analysis. I've talked with others that do something similar. What this means is that Galaxy needs to the ability to support embedded workflows. Making a change to the simple workflow currently requires manually propagating the changes to the more complex workflow, which is difficult and error-prone. Embedded/nested workflows are our on development list, but it's fairly far down the list right now because other issues are more pressing. Finally, Galaxy enables you to specify parameters that must be set at runtime, so it's possible to easily rerun Galaxy workflows with different parameter values. Best, J.
participants (5)
-
Anton Nekrutenko
-
Jacob Biesinger
-
Jacob Biesinger
-
Jeremy Goecks
-
Nate Coraor