Metadata access using tool config
Hi everyone, I am working on a tool which attempts to create a file which stores all the metadata associated with a job execution. The things I know how to access right now are the file extension, name history id, dataset id. I'd like to know how to access other things like the job id, uuid, file size, job command-line, job end time, job start time, core allocated, job runtime and any other important information associated with tool execution. I would prefer to get this information from the tool config file or my tool script and not have the user create an api key which they have to submit as a parameter to my tool. Let me know if you guys have any ideas! Katherine
Hi Katherine, For the job related stuff, I’m doing this in my tools that provide statistics for the Galaxy ChIP-exo instance I’m setting up for a lab here at Penn State. You can see variations of tool examples here: https://github.com/gregvonkuster/cegr-galaxy/tree/master/tools/cegr_statisti.... These tools are all included in workflows where the tool is generating metadata about the tool that executed immediatly prior to one of these metadata-generating tools. I use the while loop in the command to keep the this tool from executing in the workflow before its immediate predecessor is completed. I assume you are doing something similar, where your tool will generate metadata for some different tool that has already been executed. Here is an example of a basic command line in one of these tool configs that will provide the job stuff to the underlying script. <command> <![CDATA[ #set non_ready_states = ['new', 'queued', 'running', 'setting_metadata', 'upload'] #while $input.dataset.state in $non_ready_states: time.sleep(60) #end while #set history_id = $__app__.security.encode_id($input.history.id) #set history_name = $input.history.name #set job = $input.creating_job #set tool_id = $job.tool_id #set tool_parameters = "" #for p in $job.parameters: #set tool_parameters = $tool_parameters + "__SeP__" + $p.name #set tool_parameters = $tool_parameters + "__SeP__" + $p.value #end for python $__tool_directory__/bam_to_scidx_output_stats.py --config_file $__tool_directory__/stats_config.ini --input "$input" --input_id "$__app__.security.encode_id($input.id)" --input_datatype "$input.ext" --dbkey "$input.metadata.dbkey" --chrom_len_file ${chromInfo} --history_id "$history_id" --history_name "$history_name" --tool_id "$tool_id" --tool_parameters "$tool_parameters" --output "$output" ]]> </command> Cheers! Greg Von Kuster On Jul 7, 2016, at 11:40 AM, Katherine Beaulieu <katherine.beaulieu014@gmail.com<mailto:katherine.beaulieu014@gmail.com>> wrote: Hi everyone, I am working on a tool which attempts to create a file which stores all the metadata associated with a job execution. The things I know how to access right now are the file extension, name history id, dataset id. I'd like to know how to access other things like the job id, uuid, file size, job command-line, job end time, job start time, core allocated, job runtime and any other important information associated with tool execution. I would prefer to get this information from the tool config file or my tool script and not have the user create an api key which they have to submit as a parameter to my tool. Let me know if you guys have any ideas! Katherine ___________________________________________________________ Please keep all replies on the list by using "reply all" in your mail client. To manage your subscriptions to this and other Galaxy lists, please use the interface at: https://lists.galaxyproject.org/ To search Galaxy mailing lists use the unified search at: http://galaxyproject.org/search/mailinglists/
Hi Greg, Thanks for the link to your github repo, some of the information there was very useful. Do you have any idea on how to access some of the stuff you don't yourself pass to your python script such as the history content api id, or the job runtime? For example for the job runtime I feel like it should be: #set job = $input.creating_job #set tool_id = $job.runtime But that just gives an error. On Thu, Jul 7, 2016 at 12:56 PM, Von Kuster, Greg <ghv2@psu.edu> wrote:
Hi Katherine,
For the job related stuff, I’m doing this in my tools that provide statistics for the Galaxy ChIP-exo instance I’m setting up for a lab here at Penn State. You can see variations of tool examples here: https://github.com/gregvonkuster/cegr-galaxy/tree/master/tools/cegr_statisti.... These tools are all included in workflows where the tool is generating metadata about the tool that executed immediatly prior to one of these metadata-generating tools. I use the while loop in the command to keep the this tool from executing in the workflow before its immediate predecessor is completed. I assume you are doing something similar, where your tool will generate metadata for some different tool that has already been executed.
Here is an example of a basic command line in one of these tool configs that will provide the job stuff to the underlying script.
<command> <![CDATA[ #set non_ready_states = ['new', 'queued', 'running', 'setting_metadata', 'upload'] #while $input.dataset.state in $non_ready_states: time.sleep(60) #end while #set history_id = $__app__.security.encode_id($ input.history.id) #set history_name = $input.history.name #set job = $input.creating_job #set tool_id = $job.tool_id #set tool_parameters = "" #for p in $job.parameters: #set tool_parameters = $tool_parameters + "__SeP__" + $ p.name #set tool_parameters = $tool_parameters + "__SeP__" + $p.value #end for python $__tool_directory__/bam_to_scidx_output_stats.py --config_file $__tool_directory__/stats_config.ini --input "$input" --input_id "$__app__.security.encode_id($input.id)" --input_datatype "$input.ext" --dbkey "$input.metadata.dbkey" --chrom_len_file ${chromInfo} --history_id "$history_id" --history_name "$history_name" --tool_id "$tool_id" --tool_parameters "$tool_parameters" --output "$output" ]]> </command>
Cheers!
Greg Von Kuster
On Jul 7, 2016, at 11:40 AM, Katherine Beaulieu < katherine.beaulieu014@gmail.com> wrote:
Hi everyone, I am working on a tool which attempts to create a file which stores all the metadata associated with a job execution. The things I know how to access right now are the file extension, name history id, dataset id. I'd like to know how to access other things like the job id, uuid, file size, job command-line, job end time, job start time, core allocated, job runtime and any other important information associated with tool execution. I would prefer to get this information from the tool config file or my tool script and not have the user create an api key which they have to submit as a parameter to my tool. Let me know if you guys have any ideas! Katherine ___________________________________________________________ Please keep all replies on the list by using "reply all" in your mail client. To manage your subscriptions to this and other Galaxy lists, please use the interface at: https://lists.galaxyproject.org/
To search Galaxy mailing lists use the unified search at: http://galaxyproject.org/search/mailinglists/
Actually just figured out the history content api id, just needing the extra stuff about the job like runtime and start and end times. On Fri, Jul 8, 2016 at 8:55 AM, Katherine Beaulieu < katherine.beaulieu014@gmail.com> wrote:
Hi Greg, Thanks for the link to your github repo, some of the information there was very useful. Do you have any idea on how to access some of the stuff you don't yourself pass to your python script such as the history content api id, or the job runtime? For example for the job runtime I feel like it should be:
#set job = $input.creating_job #set tool_id = $job.runtime
But that just gives an error.
On Thu, Jul 7, 2016 at 12:56 PM, Von Kuster, Greg <ghv2@psu.edu> wrote:
Hi Katherine,
For the job related stuff, I’m doing this in my tools that provide statistics for the Galaxy ChIP-exo instance I’m setting up for a lab here at Penn State. You can see variations of tool examples here: https://github.com/gregvonkuster/cegr-galaxy/tree/master/tools/cegr_statisti.... These tools are all included in workflows where the tool is generating metadata about the tool that executed immediatly prior to one of these metadata-generating tools. I use the while loop in the command to keep the this tool from executing in the workflow before its immediate predecessor is completed. I assume you are doing something similar, where your tool will generate metadata for some different tool that has already been executed.
Here is an example of a basic command line in one of these tool configs that will provide the job stuff to the underlying script.
<command> <![CDATA[ #set non_ready_states = ['new', 'queued', 'running', 'setting_metadata', 'upload'] #while $input.dataset.state in $non_ready_states: time.sleep(60) #end while #set history_id = $__app__.security.encode_id($ input.history.id) #set history_name = $input.history.name #set job = $input.creating_job #set tool_id = $job.tool_id #set tool_parameters = "" #for p in $job.parameters: #set tool_parameters = $tool_parameters + "__SeP__" + $ p.name #set tool_parameters = $tool_parameters + "__SeP__" + $p.value #end for python $__tool_directory__/bam_to_scidx_output_stats.py --config_file $__tool_directory__/stats_config.ini --input "$input" --input_id "$__app__.security.encode_id($input.id)" --input_datatype "$input.ext" --dbkey "$input.metadata.dbkey" --chrom_len_file ${chromInfo} --history_id "$history_id" --history_name "$history_name" --tool_id "$tool_id" --tool_parameters "$tool_parameters" --output "$output" ]]> </command>
Cheers!
Greg Von Kuster
On Jul 7, 2016, at 11:40 AM, Katherine Beaulieu < katherine.beaulieu014@gmail.com> wrote:
Hi everyone, I am working on a tool which attempts to create a file which stores all the metadata associated with a job execution. The things I know how to access right now are the file extension, name history id, dataset id. I'd like to know how to access other things like the job id, uuid, file size, job command-line, job end time, job start time, core allocated, job runtime and any other important information associated with tool execution. I would prefer to get this information from the tool config file or my tool script and not have the user create an api key which they have to submit as a parameter to my tool. Let me know if you guys have any ideas! Katherine ___________________________________________________________ Please keep all replies on the list by using "reply all" in your mail client. To manage your subscriptions to this and other Galaxy lists, please use the interface at: https://lists.galaxyproject.org/
To search Galaxy mailing lists use the unified search at: http://galaxyproject.org/search/mailinglists/
Hi Katherine, Probably about the only way to get the job start time is job.create_time (a close estimate to the actual start time) and the end time is job.update_time (again, a close estimate), so you can calculate the job’s estimated execution time using something like this: import datetime execute_time = job.update_time - job.create_time execute_time = datetime.timedelta(seconds=execute_time.seconds) On Jul 8, 2016, at 9:03 AM, Katherine Beaulieu <katherine.beaulieu014@gmail.com<mailto:katherine.beaulieu014@gmail.com>> wrote: Actually just figured out the history content api id, just needing the extra stuff about the job like runtime and start and end times. On Fri, Jul 8, 2016 at 8:55 AM, Katherine Beaulieu <katherine.beaulieu014@gmail.com<mailto:katherine.beaulieu014@gmail.com>> wrote: Hi Greg, Thanks for the link to your github repo, some of the information there was very useful. Do you have any idea on how to access some of the stuff you don't yourself pass to your python script such as the history content api id, or the job runtime? For example for the job runtime I feel like it should be: #set job = $input.creating_job #set tool_id = $job.runtime But that just gives an error. On Thu, Jul 7, 2016 at 12:56 PM, Von Kuster, Greg <ghv2@psu.edu<mailto:ghv2@psu.edu>> wrote: Hi Katherine, For the job related stuff, I’m doing this in my tools that provide statistics for the Galaxy ChIP-exo instance I’m setting up for a lab here at Penn State. You can see variations of tool examples here: https://github.com/gregvonkuster/cegr-galaxy/tree/master/tools/cegr_statisti.... These tools are all included in workflows where the tool is generating metadata about the tool that executed immediatly prior to one of these metadata-generating tools. I use the while loop in the command to keep the this tool from executing in the workflow before its immediate predecessor is completed. I assume you are doing something similar, where your tool will generate metadata for some different tool that has already been executed. Here is an example of a basic command line in one of these tool configs that will provide the job stuff to the underlying script. <command> <![CDATA[ #set non_ready_states = ['new', 'queued', 'running', 'setting_metadata', 'upload'] #while $input.dataset.state in $non_ready_states: time.sleep(60) #end while #set history_id = $__app__.security.encode_id($input.history.id<http://input.history.id/>) #set history_name = $input.history.name<http://input.history.name/> #set job = $input.creating_job #set tool_id = $job.tool_id #set tool_parameters = "" #for p in $job.parameters: #set tool_parameters = $tool_parameters + "__SeP__" + $p.name<http://p.name/> #set tool_parameters = $tool_parameters + "__SeP__" + $p.value #end for python $__tool_directory__/bam_to_scidx_output_stats.py --config_file $__tool_directory__/stats_config.ini --input "$input" --input_id "$__app__.security.encode_id($input.id<http://input.id/>)" --input_datatype "$input.ext" --dbkey "$input.metadata.dbkey" --chrom_len_file ${chromInfo} --history_id "$history_id" --history_name "$history_name" --tool_id "$tool_id" --tool_parameters "$tool_parameters" --output "$output" ]]> </command> Cheers! Greg Von Kuster On Jul 7, 2016, at 11:40 AM, Katherine Beaulieu <katherine.beaulieu014@gmail.com<mailto:katherine.beaulieu014@gmail.com>> wrote: Hi everyone, I am working on a tool which attempts to create a file which stores all the metadata associated with a job execution. The things I know how to access right now are the file extension, name history id, dataset id. I'd like to know how to access other things like the job id, uuid, file size, job command-line, job end time, job start time, core allocated, job runtime and any other important information associated with tool execution. I would prefer to get this information from the tool config file or my tool script and not have the user create an api key which they have to submit as a parameter to my tool. Let me know if you guys have any ideas! Katherine ___________________________________________________________ Please keep all replies on the list by using "reply all" in your mail client. To manage your subscriptions to this and other Galaxy lists, please use the interface at: https://lists.galaxyproject.org/ To search Galaxy mailing lists use the unified search at: http://galaxyproject.org/search/mailinglists/
participants (2)
-
Katherine Beaulieu
-
Von Kuster, Greg