Re: [galaxy-dev] . Output file to history (Sebastian Luna Valero)
Hi Sebastian, I am not an expert in galaxy developing. So the following content is just my personal opinion. I do not think it is possible to meet you requirement. I reckon Galaxy searches the certain folder, defined in the universal_wsgi.ini by a parameter called #collect_outputs_from = new_file_path,job_working_directory. On top of that, a job id is required, which is the reason in the dynamic multiple output job id is part of the file name. If your script.py cannot follow the rules, Galaxy for sure cannot recognize the outputs. Normally in this kind of case, I write an additional script to execute the script.py with its original parameter and do the file manipulation afterwards. Of course this is not ideal, but I do nothing about python. It may be possible to write some complicated python code within the command tag of your wrapper file to do this. I am here also waiting for experts' opinions/comments. Best regards! Jun On 3/21/14 9:28 AM, Sebastian Luna Valero wrote:
Dear All,
I am trying to add a new tool in Galaxy and I have the following problem.
Let me explain a simplified example. Let's imagine that my script works from CLI as follows:
python script.py --input "input-file" --pattern "output-pattern"
After processing "input-file", the script writes several output files and their name is given according to "output-pattern".
The number of output files depends on the content of "input-file". The output files are written to the current working directory where script.py is located.
In the simplest scenario, I get only one output file in the working directory.
My problem is that I would like to see the output files in Galaxy's history without modifying the CLI interface.
To solve this problem, I have looked at the wiki page:
https://wiki.galaxyproject.org/Admin/Tools/Multiple%20Output%20Files#N umber_of_Output_datasets_cannot_be_determined_until_tool_run
and the email here:
http://dev.list.galaxyproject.org/Multiple-output-files-do-not-appear- in-history-td4660470.html
However, I think that mine is a different scenario. I do not want to add new output parameters in the CLI. I would like Galaxy to bring these output files to my history without modifying the CLI options, is that possible?
Many thanks in advance for your help!
Best regards, Sebastian.
Hi, Jun and Sebastian, This is not trivial, but it is already being done with some tools using the method described at https://wiki.galaxyproject.org/Admin/Tools/Multiple%20Output%20Files#Single_... if that can solve your problem. The Galaxy Html composite datatype can be used to expose arbitrary outputs without creating a separate history item for each one if that's what you want. The tool executable or the wrapper must write valid and complete Html content to the path Galaxy provides as the Html file for the tool and the sanitize_all_html switch must be turned off in universe_wsgi.ini (which is a potential security problem for public sites!) for the page to be pretty, but since the script itself is generating all the output it is possible to write code to create a nicely laid out page of links and images for the user. Composite datatypes (like Html) are documented in the wiki and contributions to the wiki are always welcomed - but at present, it's an advanced and not well documented possibility. If there really is demand for this functionality to be exposed, I have been thinking about an autoHtml datatype that would do what the tool factory does - look inside the job working directory at the end of the job and arrange every file it finds there into a simple Html page of links. If https://trello.com/c/vNQLZnSk gets enough upvotes, we'll put it into the development pipeline? On Sun, Mar 23, 2014 at 4:43 AM, Jun Fan <j.fan@qmul.ac.uk> wrote:
Hi Sebastian,
I am not an expert in galaxy developing. So the following content is just my personal opinion. I do not think it is possible to meet you requirement. I reckon Galaxy searches the certain folder, defined in the universal_wsgi.ini by a parameter called #collect_outputs_from = new_file_path,job_working_directory. On top of that, a job id is required, which is the reason in the dynamic multiple output job id is part of the file name. If your script.py cannot follow the rules, Galaxy for sure cannot recognize the outputs. Normally in this kind of case, I write an additional script to execute the script.py with its original parameter and do the file manipulation afterwards. Of course this is not ideal, but I do nothing about python. It may be possible to write some complicated python code within the command tag of your wrapper file to do this. I am here also waiting for experts' opinions/comments.
Best regards! Jun
On 3/21/14 9:28 AM, Sebastian Luna Valero wrote:
Dear All,
I am trying to add a new tool in Galaxy and I have the following problem.
Let me explain a simplified example. Let's imagine that my script works from CLI as follows:
python script.py --input "input-file" --pattern "output-pattern"
After processing "input-file", the script writes several output files and their name is given according to "output-pattern".
The number of output files depends on the content of "input-file". The output files are written to the current working directory where script.py is located.
In the simplest scenario, I get only one output file in the working directory.
My problem is that I would like to see the output files in Galaxy's history without modifying the CLI interface.
To solve this problem, I have looked at the wiki page:
https://wiki.galaxyproject.org/Admin/Tools/Multiple%20Output%20Files#N umber_of_Output_datasets_cannot_be_determined_until_tool_run
and the email here:
http://dev.list.galaxyproject.org/Multiple-output-files-do-not-appear- in-history-td4660470.html
However, I think that mine is a different scenario. I do not want to add new output parameters in the CLI. I would like Galaxy to bring these output files to my history without modifying the CLI options, is that possible?
Many thanks in advance for your help!
Best regards, Sebastian.
___________________________________________________________ Please keep all replies on the list by using "reply all" in your mail client. To manage your subscriptions to this and other Galaxy lists, please use the interface at: http://lists.bx.psu.edu/
To search Galaxy mailing lists use the unified search at: http://galaxyproject.org/search/mailinglists/
I have opened a pull request that I believe could significantly ease the operations like this (for instance grabbing a directory of similar output files all of the same type based on a pattern specified by the tool instead of by Galaxy). There are lots of details of in the pull request body and some examples in the form of both functional and unit tests: https://bitbucket.org/galaxy/galaxy-central/pull-request/356/enhancements-fo... Let me know (either on the pull request or here) if you would like to see changes or if this is "good enough". These kinds of dynamic outputs are not quite as feature-full as I would like - but hopefully these limitations are addressed by forthcoming dataset collection work which I believe will be able to leverage the work in this pull request. -John On Sat, Mar 22, 2014 at 6:19 PM, Ross <ross.lazarus@gmail.com> wrote:
Hi, Jun and Sebastian,
This is not trivial, but it is already being done with some tools using the method described at https://wiki.galaxyproject.org/Admin/Tools/Multiple%20Output%20Files#Single_... if that can solve your problem. The Galaxy Html composite datatype can be used to expose arbitrary outputs without creating a separate history item for each one if that's what you want. The tool executable or the wrapper must write valid and complete Html content to the path Galaxy provides as the Html file for the tool and the sanitize_all_html switch must be turned off in universe_wsgi.ini (which is a potential security problem for public sites!) for the page to be pretty, but since the script itself is generating all the output it is possible to write code to create a nicely laid out page of links and images for the user. Composite datatypes (like Html) are documented in the wiki and contributions to the wiki are always welcomed - but at present, it's an advanced and not well documented possibility.
If there really is demand for this functionality to be exposed, I have been thinking about an autoHtml datatype that would do what the tool factory does - look inside the job working directory at the end of the job and arrange every file it finds there into a simple Html page of links. If https://trello.com/c/vNQLZnSk gets enough upvotes, we'll put it into the development pipeline?
On Sun, Mar 23, 2014 at 4:43 AM, Jun Fan <j.fan@qmul.ac.uk> wrote:
Hi Sebastian,
I am not an expert in galaxy developing. So the following content is just my personal opinion. I do not think it is possible to meet you requirement. I reckon Galaxy searches the certain folder, defined in the universal_wsgi.ini by a parameter called #collect_outputs_from = new_file_path,job_working_directory. On top of that, a job id is required, which is the reason in the dynamic multiple output job id is part of the file name. If your script.py cannot follow the rules, Galaxy for sure cannot recognize the outputs. Normally in this kind of case, I write an additional script to execute the script.py with its original parameter and do the file manipulation afterwards. Of course this is not ideal, but I do nothing about python. It may be possible to write some complicated python code within the command tag of your wrapper file to do this. I am here also waiting for experts' opinions/comments.
Best regards! Jun
On 3/21/14 9:28 AM, Sebastian Luna Valero wrote:
Dear All,
I am trying to add a new tool in Galaxy and I have the following problem.
Let me explain a simplified example. Let's imagine that my script works from CLI as follows:
python script.py --input "input-file" --pattern "output-pattern"
After processing "input-file", the script writes several output files and their name is given according to "output-pattern".
The number of output files depends on the content of "input-file". The output files are written to the current working directory where script.py is located.
In the simplest scenario, I get only one output file in the working directory.
My problem is that I would like to see the output files in Galaxy's history without modifying the CLI interface.
To solve this problem, I have looked at the wiki page:
https://wiki.galaxyproject.org/Admin/Tools/Multiple%20Output%20Files#N umber_of_Output_datasets_cannot_be_determined_until_tool_run
and the email here:
http://dev.list.galaxyproject.org/Multiple-output-files-do-not-appear- in-history-td4660470.html
However, I think that mine is a different scenario. I do not want to add new output parameters in the CLI. I would like Galaxy to bring these output files to my history without modifying the CLI options, is that possible?
Many thanks in advance for your help!
Best regards, Sebastian.
___________________________________________________________ Please keep all replies on the list by using "reply all" in your mail client. To manage your subscriptions to this and other Galaxy lists, please use the interface at: http://lists.bx.psu.edu/
To search Galaxy mailing lists use the unified search at: http://galaxyproject.org/search/mailinglists/
___________________________________________________________ Please keep all replies on the list by using "reply all" in your mail client. To manage your subscriptions to this and other Galaxy lists, please use the interface at: http://lists.bx.psu.edu/
To search Galaxy mailing lists use the unified search at: http://galaxyproject.org/search/mailinglists/
participants (3)
-
John Chilton
-
Jun Fan
-
Ross