Retain the dbkey specified for an input dataset through a Galaxy workflow?
Hello Galaxians, I’m running Galaxy 15.10 and running workflows that include tools that require reference genomes (e.g. Extract Genomic DNA). I set the dbkey for the input dataset and it is retained for some tools, but not others. Running the workflow multiple times, it looks like the dbekey is lost at different tool points in the workflow. Is this a known issue or is there some setting I’ve missed. I’ve seen where the output dattype can be set for each tool, but not the dbkey. This is a problem because any tools that require a dbkey downstream result in errors. I was running the dev branch for a while, but workflow bugs in that branch forced me to revert back to 15.10. I’ve searched biostar and the mail lists, but haven’t seen an answer for this specific issue, although there are several related threads from the past. Sorry if it’s been answered and I missed it. Thanks very much for any help you can provide, Greg Von Kuster
I’ve disovered that this issue is related to tools rather than workflows, and specifically with tools that produce dataset collections on output. In the "job.finish()" method, metadata that includes the input dataset’s dbkey setting is generated correctly for output datasets that are not part of a collection, but the dbkey (and possibly other metadata attributes) are lost if the output dataset is part of a collection. I’m still digging to find how setting metadata for output dataset collections is handled differently than regular output datasets.
On Jan 22, 2016, at 2:34 PM, Greg Von Kuster <greg@bx.psu.edu> wrote:
Hello Galaxians,
I’m running Galaxy 15.10 and running workflows that include tools that require reference genomes (e.g. Extract Genomic DNA). I set the dbkey for the input dataset and it is retained for some tools, but not others. Running the workflow multiple times, it looks like the dbekey is lost at different tool points in the workflow. Is this a known issue or is there some setting I’ve missed. I’ve seen where the output dattype can be set for each tool, but not the dbkey. This is a problem because any tools that require a dbkey downstream result in errors.
I was running the dev branch for a while, but workflow bugs in that branch forced me to revert back to 15.10.
I’ve searched biostar and the mail lists, but haven’t seen an answer for this specific issue, although there are several related threads from the past. Sorry if it’s been answered and I missed it.
Thanks very much for any help you can provide,
Greg Von Kuster
I’ve tracked down how the dbkey is getting lost on tool output datasets that are part of a collection, but now I’m wondering if the tool’s <discover_datasets> tag is lacking information about the dbkey and this is why it is getting lost. At least the code implies this. John, can you help here? The populate_collection_elements() function in ~/lib/galaxy/tools/parameters/output_collect.py looks for a match on dbkey from the <discover_datasets> tag set, and if there is no match the default dbkey value “?” is associated with the output dataset in the collection. An example tool that results in this behavior has this tag set: <collection name="MP" type="list" label="Data MP: ${tool.name} on ${on_string}"> <discover_datasets pattern="(?P<designation>.*)" directory="data_MP" ext="gff" visible="false" /> </collection> I’ve not found an example anywhere in the Galaxy code or in tools that have been written to produce output collections that includes a dbkey designation in the <discover_datasets> tag set, so I’m wondering if I am correctly understanding the intent of this code. I have a work-around fix that works without adding a dbkey designation to the tag set. The caller of the populate_collection_elements() function is a function named collect_dynamic_collections(), whose signature includes the input datasets from which the dbkey can be retained. I can submit a PR that includes this approach to a fix, but if the fix is as simple as adding some kind of dbkey designation to the tag set, an example of what that should look like would be much appreciated. Thanks very much! Greg Von Kuster
On Jan 25, 2016, at 10:11 AM, Greg Von Kuster <greg@bx.psu.edu> wrote:
I’ve disovered that this issue is related to tools rather than workflows, and specifically with tools that produce dataset collections on output. In the "job.finish()" method, metadata that includes the input dataset’s dbkey setting is generated correctly for output datasets that are not part of a collection, but the dbkey (and possibly other metadata attributes) are lost if the output dataset is part of a collection. I’m still digging to find how setting metadata for output dataset collections is handled differently than regular output datasets.
On Jan 22, 2016, at 2:34 PM, Greg Von Kuster <greg@bx.psu.edu> wrote:
Hello Galaxians,
I’m running Galaxy 15.10 and running workflows that include tools that require reference genomes (e.g. Extract Genomic DNA). I set the dbkey for the input dataset and it is retained for some tools, but not others. Running the workflow multiple times, it looks like the dbekey is lost at different tool points in the workflow. Is this a known issue or is there some setting I’ve missed. I’ve seen where the output dattype can be set for each tool, but not the dbkey. This is a problem because any tools that require a dbkey downstream result in errors.
I was running the dev branch for a while, but workflow bugs in that branch forced me to revert back to 15.10.
I’ve searched biostar and the mail lists, but haven’t seen an answer for this specific issue, although there are several related threads from the past. Sorry if it’s been answered and I missed it.
Thanks very much for any help you can provide,
Greg Von Kuster
___________________________________________________________ Please keep all replies on the list by using "reply all" in your mail client. To manage your subscriptions to this and other Galaxy lists, please use the interface at: https://lists.galaxyproject.org/
To search Galaxy mailing lists use the unified search at: http://galaxyproject.org/search/mailinglists/
I’ve submitted an issue about this here: https://github.com/galaxyproject/galaxy/issues/1598
On Jan 26, 2016, at 8:15 AM, Greg Von Kuster <greg@bx.psu.edu> wrote:
I’ve tracked down how the dbkey is getting lost on tool output datasets that are part of a collection, but now I’m wondering if the tool’s <discover_datasets> tag is lacking information about the dbkey and this is why it is getting lost.
At least the code implies this. John, can you help here?
The populate_collection_elements() function in ~/lib/galaxy/tools/parameters/output_collect.py looks for a match on dbkey from the <discover_datasets> tag set, and if there is no match the default dbkey value “?” is associated with the output dataset in the collection. An example tool that results in this behavior has this tag set:
<collection name="MP" type="list" label="Data MP: ${tool.name} on ${on_string}"> <discover_datasets pattern="(?P<designation>.*)" directory="data_MP" ext="gff" visible="false" /> </collection>
I’ve not found an example anywhere in the Galaxy code or in tools that have been written to produce output collections that includes a dbkey designation in the <discover_datasets> tag set, so I’m wondering if I am correctly understanding the intent of this code.
I have a work-around fix that works without adding a dbkey designation to the tag set. The caller of the populate_collection_elements() function is a function named collect_dynamic_collections(), whose signature includes the input datasets from which the dbkey can be retained.
I can submit a PR that includes this approach to a fix, but if the fix is as simple as adding some kind of dbkey designation to the tag set, an example of what that should look like would be much appreciated.
Thanks very much!
Greg Von Kuster
On Jan 25, 2016, at 10:11 AM, Greg Von Kuster <greg@bx.psu.edu> wrote:
I’ve disovered that this issue is related to tools rather than workflows, and specifically with tools that produce dataset collections on output. In the "job.finish()" method, metadata that includes the input dataset’s dbkey setting is generated correctly for output datasets that are not part of a collection, but the dbkey (and possibly other metadata attributes) are lost if the output dataset is part of a collection. I’m still digging to find how setting metadata for output dataset collections is handled differently than regular output datasets.
On Jan 22, 2016, at 2:34 PM, Greg Von Kuster <greg@bx.psu.edu> wrote:
Hello Galaxians,
I’m running Galaxy 15.10 and running workflows that include tools that require reference genomes (e.g. Extract Genomic DNA). I set the dbkey for the input dataset and it is retained for some tools, but not others. Running the workflow multiple times, it looks like the dbekey is lost at different tool points in the workflow. Is this a known issue or is there some setting I’ve missed. I’ve seen where the output dattype can be set for each tool, but not the dbkey. This is a problem because any tools that require a dbkey downstream result in errors.
I was running the dev branch for a while, but workflow bugs in that branch forced me to revert back to 15.10.
I’ve searched biostar and the mail lists, but haven’t seen an answer for this specific issue, although there are several related threads from the past. Sorry if it’s been answered and I missed it.
Thanks very much for any help you can provide,
Greg Von Kuster
___________________________________________________________ Please keep all replies on the list by using "reply all" in your mail client. To manage your subscriptions to this and other Galaxy lists, please use the interface at: https://lists.galaxyproject.org/
To search Galaxy mailing lists use the unified search at: http://galaxyproject.org/search/mailinglists/
___________________________________________________________ Please keep all replies on the list by using "reply all" in your mail client. To manage your subscriptions to this and other Galaxy lists, please use the interface at: https://lists.galaxyproject.org/
To search Galaxy mailing lists use the unified search at: http://galaxyproject.org/search/mailinglists/
participants (1)
-
Greg Von Kuster