Using input parameter's file format in tool XML
Hi all, I've noticed from a couple of examples that when defining the command line string in an tool's XML wrapper (i.e. cheetah template), you can use $input_param_name.extension (or apparently $input_param_name.ext) to get the file format (metadata) for an input parameter called input_param_name (while of course you use $input_param_name to get the filename for this param). This doesn't seem to be documented on the tool conf wiki: http://bitbucket.org/galaxy/galaxy-central/wiki/ToolConfigSyntax Most of the examples I found looking at the tool XML use the longer form (.extension), although interestingly in a recent commit Dan used just .ext in the fastx_clipper.xml file to special case Sanger FASTQ. Is that a valid alternative too? https://bitbucket.org/galaxy/galaxy-central/changeset/93d7007bd859 I'm curious about the naming here - did Galaxy once use the format name as the actual file extension? Now at least all data files seem to have .dat as the extension (which I'm sure presents a small problem for some command line tools which make file format inferences from the filename extension, requiring hacks in their Galaxy wrappers). Peter
Hi Peter, .ext and .extension are actually the same thing (.ext is a readonly @property that returns .extension); this contains the string that Galaxy uses to declare the datatype for a particular instance of a dataset and is also used for the extension for the file when a user clicks the save icon for the dataset in their history. You are correct that all datasets are stored on disk with a .dat extension. Because the underlying files can actually be set with several different datatypes at a single time (copies of datasets within and between histories, libraries and users, where a user can change the datatype of a history item manually), its not really feasible to have them stored on disk with a more meaningful extension. However, there has been talk in the past about ways to allow the xml for individual tools to specify an actual filename that will be used during tool execution (e.g. using symlinks). This is definitely a worthwhile feature and would, as you suggest, prevent the need for hacks in several wrappers, but I don't think anyone is working on this currently. Thanks, Dan On Dec 16, 2010, at 12:03 PM, Peter wrote:
Hi all,
I've noticed from a couple of examples that when defining the command line string in an tool's XML wrapper (i.e. cheetah template), you can use $input_param_name.extension (or apparently $input_param_name.ext) to get the file format (metadata) for an input parameter called input_param_name (while of course you use $input_param_name to get the filename for this param).
This doesn't seem to be documented on the tool conf wiki: http://bitbucket.org/galaxy/galaxy-central/wiki/ToolConfigSyntax
Most of the examples I found looking at the tool XML use the longer form (.extension), although interestingly in a recent commit Dan used just .ext in the fastx_clipper.xml file to special case Sanger FASTQ. Is that a valid alternative too? https://bitbucket.org/galaxy/galaxy-central/changeset/93d7007bd859
I'm curious about the naming here - did Galaxy once use the format name as the actual file extension? Now at least all data files seem to have .dat as the extension (which I'm sure presents a small problem for some command line tools which make file format inferences from the filename extension, requiring hacks in their Galaxy wrappers).
Peter _______________________________________________ galaxy-dev mailing list galaxy-dev@lists.bx.psu.edu http://lists.bx.psu.edu/listinfo/galaxy-dev
On Fri, Dec 17, 2010 at 1:11 PM, Daniel Blankenberg wrote:
Hi Peter,
Hi Dan, Thanks for the informative email,
.ext and .extension are actually the same thing (.ext is a readonly @property that returns .extension); this contains the string that Galaxy uses to declare the datatype for a particular instance of a dataset and is also used for the extension for the file when a user clicks the save icon for the dataset in their history.
Do you think it could be added to the wiki page? http://bitbucket.org/galaxy/galaxy-central/wiki/ToolConfigSyntax
You are correct that all datasets are stored on disk with a .dat extension. Because the underlying files can actually be set with several different datatypes at a single time (copies of datasets within and between histories, libraries and users, where a user can change the datatype of a history item manually), its not really feasible to have them stored on disk with a more meaningful extension.
I see - you just change the meta data without having to modify the file on disk. That makes good sense.
However, there has been talk in the past about ways to allow the xml for individual tools to specify an actual filename that will be used during tool execution (e.g. using symlinks). This is definitely a worthwhile feature and would, as you suggest, prevent the need for hacks in several wrappers, but I don't think anyone is working on this currently.
There are two cases I can think of here, one is input files where some looks look at the extension (e.g. sam vs bam), the other is output files where the tool doesn't give you any control (e.g. it will use the input filename with another extension). Either would require hacks in the wrapper, so some more flexibility here could be useful in the future. Regards, Peter
Hi Peter,
There are two cases I can think of here, one is input files where some looks look at the extension (e.g. sam vs bam), the other is output files where the tool doesn't give you any control (e.g. it will use the input filename with another extension). Either would require hacks in the wrapper, so some more flexibility here could be useful in the future.
This is exactly what we had in mind. IIRC, there was even some talk of possible syntax on one of the mailing lists (but I failed in a quick search to find it). It is a very common problem which forces the use of relatively simple wrappers to handle the use of completely customized dataset names. Certainly a built in method in the framework to handle this would be a great improvement. Thanks, Dan On Dec 17, 2010, at 8:36 AM, Peter wrote:
On Fri, Dec 17, 2010 at 1:11 PM, Daniel Blankenberg wrote:
Hi Peter,
Hi Dan,
Thanks for the informative email,
.ext and .extension are actually the same thing (.ext is a readonly @property that returns .extension); this contains the string that Galaxy uses to declare the datatype for a particular instance of a dataset and is also used for the extension for the file when a user clicks the save icon for the dataset in their history.
Do you think it could be added to the wiki page? http://bitbucket.org/galaxy/galaxy-central/wiki/ToolConfigSyntax
You are correct that all datasets are stored on disk with a .dat extension. Because the underlying files can actually be set with several different datatypes at a single time (copies of datasets within and between histories, libraries and users, where a user can change the datatype of a history item manually), its not really feasible to have them stored on disk with a more meaningful extension.
I see - you just change the meta data without having to modify the file on disk. That makes good sense.
However, there has been talk in the past about ways to allow the xml for individual tools to specify an actual filename that will be used during tool execution (e.g. using symlinks). This is definitely a worthwhile feature and would, as you suggest, prevent the need for hacks in several wrappers, but I don't think anyone is working on this currently.
There are two cases I can think of here, one is input files where some looks look at the extension (e.g. sam vs bam), the other is output files where the tool doesn't give you any control (e.g. it will use the input filename with another extension). Either would require hacks in the wrapper, so some more flexibility here could be useful in the future.
Regards,
Peter
participants (2)
-
Daniel Blankenberg
-
Peter