Keeping track of what you did.
Hi all, I am currently setting up a Galaxy instance on a cluster. My team is intended to analyze data from NGS experiments. Our different pipelines/workflows will be integrated into Galaxy. It might be a simple question but I don't find useful information on this point. When looking back on a file result, a crucial need for us is to be able to know : - who ? (generated this file) - when ? (precise date and time) - what ? (applied workflow) - how ? (applied parameters in each workflow 'box') - and stuff like computational time & others if possible. Is there something allowing to retrieve these informations in galaxy ? Logfiles, a table in the database that we could interrogate ? As far as I currently understand, histories are managed on a per user basis, allowing one to save and share histories. Does it enable one to reproduce exactly the same pipeline without having to re-specify parameters used with new input data ('applying a workflow instance') ? Workflows may indeed be modified/improved frequently. We have to keep memory of which 'version' was used to generate our results. And where are stored the output files ? Is it possible to force galaxy to put a file result in pre-specified place ? It seems like once you have given galaxy the input files, you do not control where the output files will be stored (internal management) and all you have is a link to this output in your history panel. For the input files, I saw that there is a way to avoid Galaxy to duplicate input files when uploading them and so keep your tree structure, something which is especially useful when dealing with huge NGS data. Does exist this kind of flexibility for output files ? Best regards, A.
Hi Anthony Are you using the "Galaxy reports webapp"? - (described in the New Development News Brief from 'June 8, 2010') It will not answers all your questions, but we are using it and it is very handy to track down 'who has done what' Regards, Hans On 12/23/2010 11:39 AM, Anthony Ferrari wrote:
Hi all,
I am currently setting up a Galaxy instance on a cluster.
My team is intended to analyze data from NGS experiments. Our different pipelines/workflows will be integrated into Galaxy.
It might be a simple question but I don't find useful information on this point.
When looking back on a file result, a crucial need for us is to be able to know :
- who ? (generated this file) - when ? (precise date and time) - what ? (applied workflow) - how ? (applied parameters in each workflow 'box') - and stuff like computational time& others if possible.
Is there something allowing to retrieve these informations in galaxy ? Logfiles, a table in the database that we could interrogate ?
As far as I currently understand, histories are managed on a per user basis, allowing one to save and share histories. Does it enable one to reproduce exactly the same pipeline without having to re-specify parameters used with new input data ('applying a workflow instance') ? Workflows may indeed be modified/improved frequently. We have to keep memory of which 'version' was used to generate our results.
And where are stored the output files ? Is it possible to force galaxy to put a file result in pre-specified place ? It seems like once you have given galaxy the input files, you do not control where the output files will be stored (internal management) and all you have is a link to this output in your history panel. For the input files, I saw that there is a way to avoid Galaxy to duplicate input files when uploading them and so keep your tree structure, something which is especially useful when dealing with huge NGS data. Does exist this kind of flexibility for output files ?
Best regards,
A.
_______________________________________________ galaxy-dev mailing list galaxy-dev@lists.bx.psu.edu http://lists.bx.psu.edu/listinfo/galaxy-dev
participants (2)
-
Anthony Ferrari
-
Hans-Rudolf Hotz