Hi Keith,
Hi Björn,
Thanks for the quick reply.
It might help to take a look at what I have so far. Our Galaxy server is running at http://grid.anc.org:8000 The front page contains a very basic tutorial (and I use the term loosely) with instructions for creating a simple workflow with our tools. The "simple tutorial" assumes a good understanding of Galaxy.
Nice! Can I convince you to put these tools into the TS if everything is working? Maybe with a German beer? ;)
It is also likely worthwhile pointing out that all of our "tools" invoke SOAP/REST web services running on other servers; nothing is "local". I tend to use the terms "tool" and "service" interchangeably since all of our tools are web services.
Can you tell me how your tool detects if it was processed before by an other tool? Metadata detection? Is this is different file type? If so you can define your own datatype(s). One of your tools can only consume the file types of an other tools output and so on.
We have two issues: 1. the format of the data 2. what the data contains.
Format: This is easy; I have converters defined in the datatypes_conf.xml file and the workflow editor won't let me connect tools if the output/input formats don't match.
True, but you can add a post-processing action, where you can change the data type.
Data Contents: Just because Tool A produces format X and Tool B accepts format X does not mean the tools can be connected, I need to do deeper validation than simple format matching. For example, if you drop several of the GATE tools into a workflow you can connect them in any order as they all accept and produce the same format, however the tools must be connected in the correct order to produce something other than error messages.
There are two ways we do validation right now: each document contains metadata that describes the contents, and each service (tool) can produce metadata describing its input and output. So we can check if two tools can be connected (the output of the first satisfies the input requirements of the second) and each tool checks at runtime that the input contains the necessary data.
You can probably use data type metadata for such a use case. Please have a look at the sqlite datatype and gemini that is using it: https://github.com/galaxyproject/tools-iuc/blob/master/tools/gemini/gemini_m... You can filter sqlite according to some attached metadata, for example the version. You need to define your own metadata, and during file creation the metadata will be 'calculated' and set. You can then filter in your next tool according to this metadata.
A typical workflow might look something like:
a) Query Tool -> Server, find all documents that contain the word “cheese” b) Server -> Here is the list of document IDs [ id1, id2, …, idn ] c) WorkFlow -> for each id in the list do c1) Download document c2 ) Work work work work… c3) Persist output
I can do all of the above except the most important bit; iterating…
Oh yes, this is simple. Just create one workflow that deals with one ID. This workflow you can run on multiple ids.
That is the question; how do I run the same workflow on multiple ids? The server may return hundreds or thousands of id values so running the workflow manually for each id is not really an option.
You have all ids in one file? You could split this file into 100 of files per id and collect everything in a data collection (this is new feature in Galaxy) and run your workflow over this collection. Should work, will work for sure in the near future, no promises ;)
Oh yes this is supported out of the box! See here for a small documentation: https://github.com/bgruening/galaxytools/tree/master/chemicaltoolbox#support...
Here is a example of how you can write your own datatypes:
https://github.com/bgruening/galaxytools/tree/master/chemicaltoolbox/datatyp...
I feel like I must be missing the obvious. Here is the relevant section of my datatypes_conf.xml (you can see the full file at https://github.com/oanc/Galaxy/blob/master/config/datatypes_conf.xml)
<datatype extension="lif" type="galaxy.datatypes.text:Json" display_in_upload="True"> <converter file="convert.json2gate_2.0.0.xml" target_datatype="gate"/> </datatype> <datatype extension="gate" type="galaxy.datatypes.xml:GenericXml" mimetype="application/xml" display_in_upload="true"> <converter file="convert.gate2json_2.0.0.xml" target_datatype="lif"/> </datatype>
Is there anything I need to do beyond defining the datatypes for implicit conversions to take place?
I guess you need to place your converters under https://github.com/oanc/Galaxy/tree/master/lib/galaxy/datatypes/converters/ And get rid of 'convert.' in your datatypes_conf.xml at least if you are not using the TS. Hope this helps you a little bit more, Bjoern
Thanks, Keith Suderman
4. OAuth 2.0 / OpenID Connect:
I need to be able to fetch documents from data providers that require an OAuth 2.0 access token. Currently, I use a separate service to go through the OAuth authentication/authorization process and then have the user copy/paste their access token into a text field in Galaxy. Is there a way to perform the OAuth authentication dance required by the remote service inside Galaxy itself?
I don't think so, but maybe someone else has an idea here.
I’ve looked at the Trello site for Galaxy and see that both OAuth 2.0 and OpenID Connect are on the radar, hopefully this use case is being considered as well.
I’m sure to have more questions after working through some visualization examples, but this should keep me busy for now.
Hope you are busy now :) Cheers and keep us up to date! Bjoern
Sincerely, Keith Suderman
REFERENCES
1. https://wiki.galaxyproject.org/Admin/Tools/AddingTools
------------------------------ Research Associate Department of Computer Science Vassar College Poughkeepsie, NY
___________________________________________________________ Please keep all replies on the list by using "reply all" in your mail client. To manage your subscriptions to this and other Galaxy lists, please use the interface at: https://lists.galaxyproject.org/
To search Galaxy mailing lists use the unified search at: http://galaxyproject.org/search/mailinglists/
------------------------------ Research Associate Department of Computer Science Vassar College Poughkeepsie, NY