Spaces in GenomeSpaces file URLs from genomespace_importer tool
Hi, I am encountering a problem executing the genomespace_importer tool using the Galaxy API. The tool works just fine to copy files in GenomeSpace into Galay a history dataset EXCEPT when the GenomeSpace file URL includes a space (" ") of any sort. Below is an example problem request (the GenomeSpace URL is public, so you should be able to try it if you like). Note that the "URL" parameter includes a couple of %20 which are supposed to url-encode the space character. *POST https://usegalaxy.org/api/tools?key=XXXXXX <https://usegalaxy.org/api/tools?key=XXXXXX>* *{* * "history_id": "039421d939e31170",* * "tool_id": "genomespace_importer",* * "inputs": {* * "URL": "https://dm.genomespace.org/datamanager/file/Home/Public/RecipeData/SequenceD... <https://dm.genomespace.org/datamanager/file/Home/Public/RecipeData/SequenceData_fa%20fasta%20fastq/RNA-Seq.fastq>",* * "gs-token": null* * }* *}* The job is accepted by Galaxy but eventually it fails. An HTTP error occurs during the execution of the job. I looked through the logs in GenomeSpace. Turns out that the URL being used by Galaxy to do a GET on the file is *https://gsui.genomespace.org/datamanager/v1.0/file/Home/Public/RecipeData/Se... <https://gsui.genomespace.org/datamanager/v1.0/file/Home/Public/RecipeData/SequenceData_fa>X20fastaX20fastq/RNA-Seq.fastq* The "percents" are converted to X (I made the Xs bigger for emphasis). I attempted a couple of other experiments in the tool job submission, using actual space characters instead of %20. I also tried using the + instead of space. In both of these attempts, the exact same file path in the URL param was used for the GenomeSpace GET, leading to a BAD REQUEST error with the actual space character and NOT FOUND error with the + . So the question is: what kind of decoding is going on in the Galaxy genomespace_importer tool and how do I get rid of the Xs? Thanks Marco
Hi Marco, I'd lay money on the the tool parameter sanitization done in Cheetah intended to avoid any command insertion into the shell command. This can be configured within the tool definition using the <sanitizer> tag set: https://wiki.galaxyproject.org/Admin/Tools/ToolConfigSyntax#A.3Csanitizer.3E... Which genomespace_importer are you working from (URL please)? Peter On Wed, Oct 21, 2015 at 3:20 AM, Marco Ocana <mocana@broadinstitute.org> wrote:
Hi,
I am encountering a problem executing the genomespace_importer tool using the Galaxy API.
The tool works just fine to copy files in GenomeSpace into Galay a history dataset EXCEPT when the GenomeSpace file URL includes a space (" ") of any sort.
Below is an example problem request (the GenomeSpace URL is public, so you should be able to try it if you like).
Note that the "URL" parameter includes a couple of %20 which are supposed to url-encode the space character.
POST https://usegalaxy.org/api/tools?key=XXXXXX
{
"history_id": "039421d939e31170",
"tool_id": "genomespace_importer",
"inputs": {
"URL": "https://dm.genomespace.org/datamanager/file/Home/Public/RecipeData/SequenceD...",
"gs-token": null
}
}
The job is accepted by Galaxy but eventually it fails. An HTTP error occurs during the execution of the job.
I looked through the logs in GenomeSpace. Turns out that the URL being used by Galaxy to do a GET on the file is
https://gsui.genomespace.org/datamanager/v1.0/file/Home/Public/RecipeData/Se...
The "percents" are converted to X (I made the Xs bigger for emphasis).
I attempted a couple of other experiments in the tool job submission, using actual space characters instead of %20. I also tried using the + instead of space.
In both of these attempts, the exact same file path in the URL param was used for the GenomeSpace GET, leading to a BAD REQUEST error with the actual space character and NOT FOUND error with the + .
So the question is: what kind of decoding is going on in the Galaxy genomespace_importer tool and how do I get rid of the Xs?
Thanks
Marco
___________________________________________________________ Please keep all replies on the list by using "reply all" in your mail client. To manage your subscriptions to this and other Galaxy lists, please use the interface at: https://lists.galaxyproject.org/
To search Galaxy mailing lists use the unified search at: http://galaxyproject.org/search/mailinglists/
Hi Peter, I am getting the metadata for the tool from: https://usegalaxy.org/api/tools/genomespace_importer Thanks Marco On Wed, Oct 21, 2015 at 8:01 AM, Peter Cock <p.j.a.cock@googlemail.com> wrote:
Hi Marco,
I'd lay money on the the tool parameter sanitization done in Cheetah intended to avoid any command insertion into the shell command. This can be configured within the tool definition using the <sanitizer> tag set:
https://wiki.galaxyproject.org/Admin/Tools/ToolConfigSyntax#A.3Csanitizer.3E...
Which genomespace_importer are you working from (URL please)?
Peter
On Wed, Oct 21, 2015 at 3:20 AM, Marco Ocana <mocana@broadinstitute.org> wrote:
Hi,
I am encountering a problem executing the genomespace_importer tool using the Galaxy API.
The tool works just fine to copy files in GenomeSpace into Galay a history dataset EXCEPT when the GenomeSpace file URL includes a space (" ") of any sort.
Below is an example problem request (the GenomeSpace URL is public, so you should be able to try it if you like).
Note that the "URL" parameter includes a couple of %20 which are supposed to url-encode the space character.
POST https://usegalaxy.org/api/tools?key=XXXXXX
{
"history_id": "039421d939e31170",
"tool_id": "genomespace_importer",
"inputs": {
"URL": " https://dm.genomespace.org/datamanager/file/Home/Public/RecipeData/SequenceD... ",
"gs-token": null
}
}
The job is accepted by Galaxy but eventually it fails. An HTTP error occurs during the execution of the job.
I looked through the logs in GenomeSpace. Turns out that the URL being used by Galaxy to do a GET on the file is
https://gsui.genomespace.org/datamanager/v1.0/file/Home/Public/RecipeData/Se...
The "percents" are converted to X (I made the Xs bigger for emphasis).
I attempted a couple of other experiments in the tool job submission,
using
actual space characters instead of %20. I also tried using the + instead of space.
In both of these attempts, the exact same file path in the URL param was used for the GenomeSpace GET, leading to a BAD REQUEST error with the actual space character and NOT FOUND error with the + .
So the question is: what kind of decoding is going on in the Galaxy genomespace_importer tool and how do I get rid of the Xs?
Thanks
Marco
___________________________________________________________ Please keep all replies on the list by using "reply all" in your mail client. To manage your subscriptions to this and other Galaxy lists, please use the interface at: https://lists.galaxyproject.org/
To search Galaxy mailing lists use the unified search at: http://galaxyproject.org/search/mailinglists/
Thanks Marco, OK, that seems to still be part of the Galaxy core: https://github.com/galaxyproject/galaxy/blob/dev/tools/genomespace/genomespa... I think you need to file a bug with the main GitHub repository (or Trello). Looking at genomespace_importer.xml I don't see how the hidden URL parameter is used, but I would still guess it just needs a <sanitizer> tag to to allow the percent sign for any URL-encoded characters to be passed. Peter On Wed, Oct 21, 2015 at 3:01 PM, Marco Ocana <mocana@broadinstitute.org> wrote:
Hi Peter,
I am getting the metadata for the tool from:
https://usegalaxy.org/api/tools/genomespace_importer
Thanks
Marco
On Wed, Oct 21, 2015 at 8:01 AM, Peter Cock <p.j.a.cock@googlemail.com> wrote:
Hi Marco,
I'd lay money on the the tool parameter sanitization done in Cheetah intended to avoid any command insertion into the shell command. This can be configured within the tool definition using the <sanitizer> tag set:
https://wiki.galaxyproject.org/Admin/Tools/ToolConfigSyntax#A.3Csanitizer.3E...
Which genomespace_importer are you working from (URL please)?
Peter
On Wed, Oct 21, 2015 at 3:20 AM, Marco Ocana <mocana@broadinstitute.org> wrote:
Hi,
I am encountering a problem executing the genomespace_importer tool using the Galaxy API.
The tool works just fine to copy files in GenomeSpace into Galay a history dataset EXCEPT when the GenomeSpace file URL includes a space (" ") of any sort.
Below is an example problem request (the GenomeSpace URL is public, so you should be able to try it if you like).
Note that the "URL" parameter includes a couple of %20 which are supposed to url-encode the space character.
POST https://usegalaxy.org/api/tools?key=XXXXXX
{
"history_id": "039421d939e31170",
"tool_id": "genomespace_importer",
"inputs": {
"URL":
"https://dm.genomespace.org/datamanager/file/Home/Public/RecipeData/SequenceD...",
"gs-token": null
}
}
The job is accepted by Galaxy but eventually it fails. An HTTP error occurs during the execution of the job.
I looked through the logs in GenomeSpace. Turns out that the URL being used by Galaxy to do a GET on the file is
https://gsui.genomespace.org/datamanager/v1.0/file/Home/Public/RecipeData/Se...
The "percents" are converted to X (I made the Xs bigger for emphasis).
I attempted a couple of other experiments in the tool job submission, using actual space characters instead of %20. I also tried using the + instead of space.
In both of these attempts, the exact same file path in the URL param was used for the GenomeSpace GET, leading to a BAD REQUEST error with the actual space character and NOT FOUND error with the + .
So the question is: what kind of decoding is going on in the Galaxy genomespace_importer tool and how do I get rid of the Xs?
Thanks
Marco
Thanks for you help Peter. I have reported the problem using the built in Galaxy reporting tool from one of my failed jobs. I also created a Trello bug submission. Regards, Marco On Wed, Oct 21, 2015 at 10:15 AM, Peter Cock <p.j.a.cock@googlemail.com> wrote:
Thanks Marco,
OK, that seems to still be part of the Galaxy core:
https://github.com/galaxyproject/galaxy/blob/dev/tools/genomespace/genomespa...
I think you need to file a bug with the main GitHub repository (or Trello).
Looking at genomespace_importer.xml I don't see how the hidden URL parameter is used, but I would still guess it just needs a <sanitizer> tag to to allow the percent sign for any URL-encoded characters to be passed.
Peter
On Wed, Oct 21, 2015 at 3:01 PM, Marco Ocana <mocana@broadinstitute.org> wrote:
Hi Peter,
I am getting the metadata for the tool from:
https://usegalaxy.org/api/tools/genomespace_importer
Thanks
Marco
On Wed, Oct 21, 2015 at 8:01 AM, Peter Cock <p.j.a.cock@googlemail.com> wrote:
Hi Marco,
I'd lay money on the the tool parameter sanitization done in Cheetah intended to avoid any command insertion into the shell command. This can be configured within the tool definition using the <sanitizer> tag set:
https://wiki.galaxyproject.org/Admin/Tools/ToolConfigSyntax#A.3Csanitizer.3E...
Which genomespace_importer are you working from (URL please)?
Peter
On Wed, Oct 21, 2015 at 3:20 AM, Marco Ocana <mocana@broadinstitute.org
wrote:
Hi,
I am encountering a problem executing the genomespace_importer tool using the Galaxy API.
The tool works just fine to copy files in GenomeSpace into Galay a history dataset EXCEPT when the GenomeSpace file URL includes a space (" ") of any sort.
Below is an example problem request (the GenomeSpace URL is public, so you should be able to try it if you like).
Note that the "URL" parameter includes a couple of %20 which are supposed to url-encode the space character.
POST https://usegalaxy.org/api/tools?key=XXXXXX
{
"history_id": "039421d939e31170",
"tool_id": "genomespace_importer",
"inputs": {
"URL":
" https://dm.genomespace.org/datamanager/file/Home/Public/RecipeData/SequenceD... ",
"gs-token": null
}
}
The job is accepted by Galaxy but eventually it fails. An HTTP error occurs during the execution of the job.
I looked through the logs in GenomeSpace. Turns out that the URL being used by Galaxy to do a GET on the file is
https://gsui.genomespace.org/datamanager/v1.0/file/Home/Public/RecipeData/Se...
The "percents" are converted to X (I made the Xs bigger for emphasis).
I attempted a couple of other experiments in the tool job submission, using actual space characters instead of %20. I also tried using the + instead of space.
In both of these attempts, the exact same file path in the URL param
was
used for the GenomeSpace GET, leading to a BAD REQUEST error with the actual space character and NOT FOUND error with the + .
So the question is: what kind of decoding is going on in the Galaxy genomespace_importer tool and how do I get rid of the Xs?
Thanks
Marco
See: https://trello.com/c/YBK8ZPFb/ Hopefully one of the Galaxy team can take a look. Peter On Wed, Oct 21, 2015 at 3:48 PM, Marco Ocana <mocana@broadinstitute.org> wrote:
Thanks for you help Peter. I have reported the problem using the built in Galaxy reporting tool from one of my failed jobs. I also created a Trello bug submission.
Regards,
Marco
On Wed, Oct 21, 2015 at 10:15 AM, Peter Cock <p.j.a.cock@googlemail.com> wrote:
Thanks Marco,
OK, that seems to still be part of the Galaxy core:
https://github.com/galaxyproject/galaxy/blob/dev/tools/genomespace/genomespa...
I think you need to file a bug with the main GitHub repository (or Trello).
Looking at genomespace_importer.xml I don't see how the hidden URL parameter is used, but I would still guess it just needs a <sanitizer> tag to to allow the percent sign for any URL-encoded characters to be passed.
Peter
On Wed, Oct 21, 2015 at 3:01 PM, Marco Ocana <mocana@broadinstitute.org> wrote:
Hi Peter,
I am getting the metadata for the tool from:
https://usegalaxy.org/api/tools/genomespace_importer
Thanks
Marco
participants (2)
-
Marco Ocana
-
Peter Cock