Thanks for the quick reply.
It might help to take a look at what I have so far. Our Galaxy server is running at
http://grid.anc.org:8000 The front page contains a very basic tutorial (and I use the term loosely) with instructions for creating a simple workflow with our tools. The "simple tutorial" assumes a good understanding of Galaxy.
It is also likely worthwhile pointing out that all of our "tools" invoke SOAP/REST web services running on other servers; nothing is "local". I tend to use the terms "tool" and "service" interchangeably since all of our tools are web services.
Can you tell me how your tool detects if it was processed before by an
other tool? Metadata detection? Is this is different file type? If so
you can define your own datatype(s). One of your tools can only consume
the file types of an other tools output and so on.
We have two issues:
1. the format of the data
2. what the data contains.
Format: This is easy; I have converters defined in the datatypes_conf.xml file and the workflow editor won't let me connect tools if the output/input formats don't match.
Data Contents: Just because Tool A produces format X and Tool B accepts format X does not mean the tools can be connected, I need to do deeper validation than simple format matching. For example, if you drop several of the GATE tools into a workflow you can connect them in any order as they all accept and produce the same format, however the tools must be connected in the correct order to produce something other than error messages.
There are two ways we do validation right now: each document contains metadata that describes the contents, and each service (tool) can produce metadata describing its input and output. So we can check if two tools can be connected (the output of the first satisfies the input requirements of the second) and each tool checks at runtime that the input contains the necessary data.
A typical workflow might look something like:
a) Query Tool -> Server, find all documents that contain the word “cheese”
b) Server -> Here is the list of document IDs [ id1, id2, …, idn ]
c) WorkFlow -> for each id in the list do
c1) Download document
c2 ) Work work work work…
c3) Persist output
I can do all of the above except the most important bit; iterating…
Oh yes, this is simple. Just create one workflow that deals with one ID.
This workflow you can run on multiple ids.
That is the question; how do I run the same workflow on multiple ids? The server may return hundreds or thousands of id values so running the workflow manually for each id is not really an option.
Is there anything I need to do beyond defining the datatypes for implicit conversions to take place?