Thank you Greg.> This job keeps information about the tool that was used, including the version. The results of the job running is the analysis consisting of one or more additional datasets.[Wanmei] does the job also keep the information about which input dataset is used besides the tool&version?
State | Job Id | Create Time | Time To Finish | Session Id |
ok | 3865189 | 2012-05-28 00:00:56.419746 | 0:00:34 | 5531371 |
Tool | User | Runner | Runner Id | |
Filter1 | xxxxxx | pbs://torque.g2.bx.psu.edu/ | 2305392.thumper.g2.bx.psu.edu | |
Remote Host | ||||
xxx.xxx.xxx.xxx | ||||
Command Line | ||||
python /galaxy/home/g2main/galaxy_main/tools/stats/filtering.py /galaxy/main_pool/pool1/files/004/366/dataset_4366992.dat /galaxy/main_pool/pool5/tmp/job_working_directory/003/865/3865189/galaxy_dataset_4366996.dat "c3!=__sq__No results__sq__" 30 "str,str,str,str,int,float,str,float,str,str,int,float,str,str,int,str,str,str,str,str,str,int,str,str,str,list,str,list,str,str" | ||||
Stdout | ||||
Filtering with c3!='No results', kept 46.58% of 1241 valid lines (1241 total lines). | ||||
Stderr | ||||
Stack Trace | ||||
None | ||||
Info | ||||
None |
> With each new analysis, new datasets are produced. In no case are previous datasets overwritten. With the new analysis in your example, the job again has information about the tool / version combination that produced the dataset. So, like I described above, the job can be rerun at some later point. The resulting dataset is not versioned in the way you describe, but information is kept about the analysis process that produced the resulting dataset.[Wanmei] I think you mean this for the example we discussed: Galaxy will keep two separate jobs: Job#1 is the previous analysis with the corresponding tool/version/output dataset; Job#2 is the new analysis with the corresponding tool/version/output dataset. Is my understanding correct?
Thanks,Wanmei
From: Greg Von Kuster <greg@bx.psu.edu>
To: Wanmei <wanmei_06@yahoo.com>
Cc: "galaxy-dev@lists.bx.psu.edu" <galaxy-dev@lists.bx.psu.edu>
Sent: Monday, May 28, 2012 7:13 AM
Subject: Re: [galaxy-dev] Data file and Analysis Program versioning
Hello Wanmei,On May 27, 2012, at 9:43 PM, Wanmei wrote:Hi All,I am pretty new to Galaxy. I would like to understand Galaxy's versioning capability from an end-user perspective (i do not mean the versioning capability that Mercurial offers in Galaxy repo).I did some research and found the following link mentioned a use case: if an end-user would like to rerun an analysis which was previously run using a different version of the analysis program, then Galaxy will prompt the end-user whether he/she would like to proceed with the new analysis. From this screenshot (in the link), it looks like Galaxy keeps track of the metadata of a output data file such as which analysis program and which version of the code produce it. is my understanding correct?http://wiki.g2.bx.psu.edu/Tool%20Shed#Galaxy_Tool_VersionsYou are correct. In Galaxy, the process of providing an input dataset to an analysis tool creates a Galaxy job. This job keeps information about the tool that was used, including the version. The results of the job running is the analysis consisting of one or more additional datasets. At some later point when a Galaxy user attempts to rerun the job, the original job information is inspected to determine the tool / version combination that was used in the job. Then the current Galaxy tool box is inspected to see if that tool / version combination is available in the tool box or if a derivative tool / version combination is available, allowing the user to rerun the tool with either the original or the derivative.If the answer to my above question is yes, then i have one more question. does Galaxy version the output data as well? What i means is, for example, if the end-user agrees to use a newer version of the code to rerun (answer Yes to Galaxy's prompt), will the newly generated output be marked as version #2 as oppose to the original output (version #1)? Or it will just simply overwrites the previous analysis output file?With each new analysis, new datasets are produced. In no case are previous datasets overwritten. With the new analysis in your example, the job again has information about the tool / version combination that produced the dataset. So, like I described above, the job can be rerun at some later point. The resulting dataset is not versioned in the way you describe, but information is kept about the analysis process that produced the resulting dataset.Greg Von Kuster___________________________________________________________Thanks,Wanmei
Please keep all replies on the list by using "reply all"
in your mail client. To manage your subscriptions to this
and other Galaxy lists, please use the interface at:
http://lists.bx.psu.edu/