Hi Keith, Am 21.04.2015 um 21:21 schrieb Keith Suderman:
On Apr 15, 2015, at 5:35 AM, Björn Grüning <bjoern.gruening@gmail.com> wrote:
Nice! Can I convince you to put these tools into the TS if everything is working? Maybe with a German beer? ;)
Do you have a beer preference?
Outing: I'm one of the rare Germans that do not drink alcohol ;)
We will definitely share our tools when they are stable, either on the public tool shed or we will set up our own (public) tool shed.
Nice!
However, there are two issues I will have to address before releasing anything:
1. All of our tools use a custom command interpreter; anyone wanting to install our tools would have to install our interpreter, make it available on the $PATH and restart Galaxy. That is why I was thinking we should set up our own tool shed; to install from our tool shed assumes you have our interpreter installed.
This can be done via the ToolShed. I assume your custom command interpreter is not different than python or perl as interpreter? Fine, we have python and perl in the TS and so we can have "your custom interpreter" as well. In the end your tools will depend in this and voila! No extra Tool Shed needed.
2. Tests: almost all of our tools call the same script that in turn calls some remote web service. The web services go through their own unit/integration tests before they are deployed so all the Galaxy tests really do is use a lot of bandwidth to check if the server has an internet connection.
Is this a question? Do you want to improve this? Even if you depend on such a webservice these test will still check the syntax and the outputs of your tool.
True, but you can add a post-processing action, where you can change the data type.
Ok, I found it. I can add the post-processing step to change the data type and that allows me to connect the tools in the workflow editor, but the converter is not being invoked when I run the workflow.
This sounds like a bug. Or you are mixing conversion with "relabelling". Galaxy has two concepts. One is really converting files ... creating a new dataset. The other is changing metadata ... telling Galaxy you know better what file type it is. Without creating a new datatype. You are probably searching for conversion tools. These needs to be reachable via the normal toolbox if I'm not misleading.
The editor also allows me to select output formats that have no converters defined, so either I am still missing something or the workflow editor does not do what I want. I can convert formats through the "Edit attributes" menu, so Galaxy knows about my converters and how to invoke them, just not in the workflow editor.
Ok, I think I understood. Not sure if this is the best way but put your converters into the toolbox.
You can filter sqlite according to some attached metadata, for example the version. You need to define your own metadata, and during file creation the metadata will be 'calculated' and set. You can then filter in your next tool according to this metadata.
Do you have more pointers to tools that use the attached metadata? In particular tools that set metadata that is consumed by subsequent tools.
The sqlite datatype should be a good example. Keep in mind, we can not set metadata from inside a tool. Imho this is not possible, yet, but a common requested feature. But you can "calculate" such metadata inside your datatype definition and set it implicitly after your tool is finished.
A typical workflow might look something like:
a) Query Tool -> Server, find all documents that contain the word “cheese” b) Server -> Here is the list of document IDs [ id1, id2, …, idn ] c) WorkFlow -> for each id in the list do c1) Download document c2 ) Work work work work… c3) Persist output
I can do all of the above except the most important bit; iterating…
Oh yes, this is simple. Just create one workflow that deals with one ID. This workflow you can run on multiple ids.
That is the question; how do I run the same workflow on multiple ids? The server may return hundreds or thousands of id values so running the workflow manually for each id is not really an option.
You have all ids in one file? You could split this file into 100 of files per id and collect everything in a data collection (this is new feature in Galaxy) and run your workflow over this collection.
Should work, will work for sure in the near future, no promises ;)
The ids are all in one file, but it is easy enough to split them into 100s (1000s) of files if needed.
Do you have pointers to any documentation on data collections? My searches haven't turned up much but tantalizing references [1], and my experiments trying to return a data collection from a tool have been unsuccessful.
https://wiki.galaxyproject.org/Histories?highlight=%28collection%29#Dataset_... https://wiki.galaxyproject.org/Admin/Tools/ToolConfigSyntax -> data_collection And have a look at: https://github.com/galaxyproject/galaxy/tree/dev/test/functional/tools
I have tried the instructions for tools that generate multiple output files [2], but the Galaxy UI starts having problems when I add more than a few hundred history items; sorry I didn't make better notes, but the problem (timeouts) seem to be with JQuery updating the CSS styles in the history panel. It also makes the UI a bit unwieldy with that many history items.
Are you running a recent Galaxy version? Try to run the latest developer version, data collections are really new and I hope it will shine even more with the next release.
I have also been trying John Chiltons blend4j and managed to populate a data library, and this is almost what I want, but I would like a tool that can be included in a workflow as the data from the library may not necessarily be the first step. I have no problem calling the Galaxy API from my tools, except that between the bioinformatics lingo and Python (I'm a Java programmer) it's slow going.
If possible at all you should avoid this, but as last resort probably an option. Ciao, Bjoern
Cheers, Keith
REFERENCES
1. https://wiki.galaxyproject.org/Learn/API#Collections 2. https://wiki.galaxyproject.org/Admin/Tools/Multiple%20Output%20Files#Number_...
Oh yes this is supported out of the box! See here for a small documentation: https://github.com/bgruening/galaxytools/tree/master/chemicaltoolbox#support...
Here is a example of how you can write your own datatypes:
https://github.com/bgruening/galaxytools/tree/master/chemicaltoolbox/datatyp...
I feel like I must be missing the obvious. Here is the relevant section of my datatypes_conf.xml (you can see the full file at https://github.com/oanc/Galaxy/blob/master/config/datatypes_conf.xml)
<datatype extension="lif" type="galaxy.datatypes.text:Json" display_in_upload="True"> <converter file="convert.json2gate_2.0.0.xml" target_datatype="gate"/> </datatype> <datatype extension="gate" type="galaxy.datatypes.xml:GenericXml" mimetype="application/xml" display_in_upload="true"> <converter file="convert.gate2json_2.0.0.xml" target_datatype="lif"/> </datatype>
Is there anything I need to do beyond defining the datatypes for implicit conversions to take place?
I guess you need to place your converters under https://github.com/oanc/Galaxy/tree/master/lib/galaxy/datatypes/converters/
And get rid of 'convert.' in your datatypes_conf.xml at least if you are not using the TS.
Hope this helps you a little bit more, Bjoern
Thanks, Keith Suderman
4. OAuth 2.0 / OpenID Connect:
I need to be able to fetch documents from data providers that require an OAuth 2.0 access token. Currently, I use a separate service to go through the OAuth authentication/authorization process and then have the user copy/paste their access token into a text field in Galaxy. Is there a way to perform the OAuth authentication dance required by the remote service inside Galaxy itself?
I don't think so, but maybe someone else has an idea here.
I’ve looked at the Trello site for Galaxy and see that both OAuth 2.0 and OpenID Connect are on the radar, hopefully this use case is being considered as well.
I’m sure to have more questions after working through some visualization examples, but this should keep me busy for now.
Hope you are busy now :) Cheers and keep us up to date! Bjoern
Sincerely, Keith Suderman
REFERENCES
1. https://wiki.galaxyproject.org/Admin/Tools/AddingTools
------------------------------ Research Associate Department of Computer Science Vassar College Poughkeepsie, NY
___________________________________________________________ Please keep all replies on the list by using "reply all" in your mail client. To manage your subscriptions to this and other Galaxy lists, please use the interface at: https://lists.galaxyproject.org/
To search Galaxy mailing lists use the unified search at: http://galaxyproject.org/search/mailinglists/
------------------------------ Research Associate Department of Computer Science Vassar College Poughkeepsie, NY
------------------------------ Research Associate Department of Computer Science Vassar College Poughkeepsie, NY