Again: Variable inputs in a workflow
Dear galaxy-dev's, about a year ago you discussed on variable number of inputs into a workflow (see http://lists.bx.psu.edu/pipermail/galaxy-dev/2012-November/012012.html and http://thread.gmane.org/gmane.science.biology.galaxy.devel/4502/focus=4502, for example). We're interested in having something simple like a number of fastq files that have to undergo automated quality control and trimming. A workflow would describe steps that need to be done per file (like fastqc, quality trimmer and so on), and in the end, a tabular data summarizes statistics on all files. Of course, we could prebuild workflows for 2,3,4,5,..n input files and join the output, but it would be much cooler to have it variable. Since it's possible to start many instances of a workflow (i.e. one instance per file), this would be a good starting point. But how would one combine outputs of those instances? Is anyone out there having experience with such setups? Which files are involved in starting many instances of a workflow? Any other ideas or suggestions on how to go from here? Thanks a lot for any input! Best, Jens Max-Planck-Institute for Heart and Lung Research Bioinformatics Service - FGI Ludwigstraße 43 61231 Bad Nauheim Phone. +49 6032 705 1765 Mail. jens.preussner@mpi-bn.mpg.de<mailto:jens.preussner@mpi-bn.mpg.de>
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 Galaxy developers, I would be particularly interested in seeing this implemented as flow control blocks. Having such blocks in workflows would vastly expand the capabilities. Ideally there would be something like - - for (i=0 to max) - - foreach (file in files) - - while - - if (could pass off "testing" to some external bit of code/another programme which would return 1/0) - - switch case This would allow you to run a copy of the workflow foreach of Jens' files, and then run another foreach loop on the outputs of the foreach to concatenate them down to a single file (or something like that to combine them) Just my two cents. Cheers, Eric On 12/10/2013 04:05 AM, Preussner, Jens wrote:
Dear galaxy-dev’s,
about a year ago you discussed on variable number of inputs into a workflow (see http://lists.bx.psu.edu/pipermail/galaxy-dev/2012-November/012012.html and http://thread.gmane.org/gmane.science.biology.galaxy.devel/4502/focus=4502, for example). We’re interested in having something simple like a number of fastq files that have to undergo automated quality control and trimming. A workflow would describe steps that need to be done per file (like fastqc, quality trimmer and so on), and in the end, a tabular data summarizes statistics on all files. Of course, we could prebuild workflows for 2,3,4,5,..n input files and join the output, but it would be much cooler to have it variable. Since it’s possible to start many instances of a workflow (i.e. one instance per file), this would be a good starting point. But how would one combine outputs of those instances? Is anyone out there having experience with such setups? Which files are involved in starting many instances of a workflow? Any other ideas or suggestions on how to go from here? Thanks a lot for any input!
Best,
Jens
Max-Planck-Institute for Heart and Lung Research
Bioinformatics Service - FGI
Ludwigstraße 43
61231 Bad Nauheim
Phone. +49 6032 705 1765
Mail. jens.preussner@mpi-bn.mpg.de <mailto:jens.preussner@mpi-bn.mpg.de>
___________________________________________________________ Please keep all replies on the list by using "reply all" in your mail client. To manage your subscriptions to this and other Galaxy lists, please use the interface at: http://lists.bx.psu.edu/
To search Galaxy mailing lists use the unified search at: http://galaxyproject.org/search/mailinglists/
- -- Eric Rasche Programmer II Center for Phage Technology Texas A&M University College Station, TX 77843 404-692-2048 esr@tamu.edu rasche.eric@yandex.ru -----BEGIN PGP SIGNATURE----- Version: GnuPG v2.0.17 (GNU/Linux) iQIcBAEBAgAGBQJSpyfTAAoJEMqDXdrsMcpVcWUP/R8EJoWwhJl1ciIUsghqyqOW hCVVooD2PHw19ttIvXVtPae98U3RuqmWKVP1T9bJ8S1ygX8QneOOPog1jecZ9t87 sv1oztAvIjdiEVMUQL35uF6dcSkanpWWlf8VretZGBoSaXxyNn8+iYadZLZ4fBzq yXoHwsbfzqdAHevrIRFrkZfY9vSsf+9Y767DBgZYYIYUD6prMFmztyMgf1LL7FoO NUFgnQtWcrISmvs7B+Kro4jr5uCJsmP5i0k6ssdHDpUvUCDFDJZHqM9AEkGTTkDC QUR/G1V1kV7YGXfFVxVKtz6k35M0aLGHX+ZnmYCnQdbdv4zIwSbdvNxfDvz6t6fQ mr0yKpZvUAVvp/M62LfnwzlxAVKi+8r+ENu57PhGRaSG4jCJHWGm5S84iwvIlJw/ SIDDb2SgBWzS53olwhQOHj3LhBCG+tIsv4Nc0jPs7CXoZ5/YhZMWuQdy3JvQ/AU8 +NrOmFol8hcpFTgAbHseHgoKIOVpgCXBVMCFx8TQxgKAJvzU/5RaoMum8Tl+UH7e 9XxxBtxkX2LOBdqGz6oCDPc1i4Uo+ewwlvv3FFpsxUc8nUWU2pA7pSfCOEa1zEJn RMPVHGDEAgyAAnXBrYJed+vT6+vA3d7oHfxeDwN1O304wwNItUMkXbwb32r2XdHb q6aRY8oOm1ZlayTP0/M7 =e/q6 -----END PGP SIGNATURE-----
In a former position I wrote extensions to Galaxy that allow workflows such this: https://bitbucket.org/galaxy/galaxy-central/pull-request/116/multiple-file-d... https://bitbucket.org/msiappdev/galaxy-extras/commits/all http://bit.ly/beyond-proteomics so believe me I understand this is important. I was recently hired by the Galaxy team and am now working on a more palatable variant on these ideas. More palatable also means more complex and more work to implement unfortunately... but I am working on it. In the meantime, have you considered driving these kind of workflows via the API? I think the refinery platform (https://github.com/parklab/refinery-platform) for instance targets the Galaxy API and rewrites workflows at runtime to handle variable numbers of inputs mitigating Galaxy's limitations. This would require some custom development and some mechanism outside of the traditional Galaxy channels to launch the workflows and probably is not worth the effort unless you are talking about a small pool of fixed workflows. -John On Tue, Dec 10, 2013 at 4:05 AM, Preussner, Jens <Jens.Preussner@mpi-bn.mpg.de> wrote:
Dear galaxy-dev’s,
about a year ago you discussed on variable number of inputs into a workflow (see http://lists.bx.psu.edu/pipermail/galaxy-dev/2012-November/012012.html and http://thread.gmane.org/gmane.science.biology.galaxy.devel/4502/focus=4502, for example). We’re interested in having something simple like a number of fastq files that have to undergo automated quality control and trimming. A workflow would describe steps that need to be done per file (like fastqc, quality trimmer and so on), and in the end, a tabular data summarizes statistics on all files. Of course, we could prebuild workflows for 2,3,4,5,..n input files and join the output, but it would be much cooler to have it variable. Since it’s possible to start many instances of a workflow (i.e. one instance per file), this would be a good starting point. But how would one combine outputs of those instances? Is anyone out there having experience with such setups? Which files are involved in starting many instances of a workflow? Any other ideas or suggestions on how to go from here? Thanks a lot for any input!
Best,
Jens
Max-Planck-Institute for Heart and Lung Research
Bioinformatics Service - FGI
Ludwigstraße 43
61231 Bad Nauheim
Phone. +49 6032 705 1765
Mail. jens.preussner@mpi-bn.mpg.de
___________________________________________________________ Please keep all replies on the list by using "reply all" in your mail client. To manage your subscriptions to this and other Galaxy lists, please use the interface at: http://lists.bx.psu.edu/
To search Galaxy mailing lists use the unified search at: http://galaxyproject.org/search/mailinglists/
participants (3)
-
Eric Rasche
-
John Chilton
-
Preussner, Jens