Keeping track of what you did.

23 Dec 2010

      Hi all,

I am currently setting up a Galaxy instance on a cluster.

My team is intended to analyze data from NGS experiments.
Our different pipelines/workflows will be integrated into Galaxy.

It might be a simple question but I don't find useful information on this
point.

When looking back on a file result, a crucial need for us is to be able to
know :

 - who ? (generated this file)
 - when ? (precise date and time)
 - what ? (applied workflow)
 - how ? (applied parameters in each workflow 'box')
 - and stuff like computational time & others if possible.

Is there something allowing to retrieve these informations in galaxy ?
Logfiles, a table in the database that we could interrogate ?

As far as I currently understand, histories are managed on a per user basis,
allowing one to save and share histories. Does it enable one to reproduce
exactly the same pipeline without having to re-specify parameters used with
new input data ('applying a workflow instance') ?
Workflows may indeed be modified/improved frequently. We have to keep memory
of which 'version' was used to generate our results.

And where are stored the output files ? Is it possible to force galaxy to
put a file result in pre-specified place ?
It seems like once you have given galaxy the input files, you do not control
where the output files will be stored (internal management) and all you have
is a link to this output in your history panel.
For the input files, I saw that there is a way to avoid Galaxy to duplicate
input files when uploading them and so keep your tree structure, something
which is especially useful when dealing with huge NGS data. Does exist this
kind of flexibility for output files ?

Best regards,

A.

Anthony Ferrari

Hans-Rudolf Hotz

tags

participants (2)