On Mar 15, 2011, at 7:03 PM, Darren Brown wrote:
But I am kinda stuck at what these guys actually mean.
The execute_workflow.py command line inputs are indeed a little clunky for all the information that has to go into each dataset mapping parameter. I didn't imagine that this script would actually be used directly very often, but rather would serve as an example of how to execute a single workflow from code with particular inputs. The three parts are workflow step, source type, and input id. For the source component, use 'hda' with the encoded id you're getting from a history, or 'ldda' for an id from a library dataset.
Which brings me to my general question. While I appear to be close in selecting the correct history and workflow ids, it only works right now as a proof of concept since I would need to sort of generate these on my own for a user to run a workflow via the galaxy interface. It seems you are hashing these history, workflow and dataset ids, but I am not really sure what you are using to hash them. Looks like not a SHA1 sum. Given only access to the galaxy database, I would like to execute a workflow, so I would need to be able to generate the hashed values to throw at the api. Does that make sense?
You could definitely generate the hashed values on your own if you wanted. We use the blowfish implementation in pycrypto, with the 'id_secret' in your universe_wsgi.ini as the key. Given an object_id and the id_secret from Galaxy, you should be able to do something like (code directly from lib/galaxy/web/security/__init__.py): from Crypto.Cipher import Blowfish cipher = Blowfish.new( id_secret_from_galaxy ) str_id = str(object_id) padded_id = ( "!" * ( 8 - len(str_id) % 8 ) ) + str_id encoded_id = cipher.encrypt(padded_id).encode('hex') Ideally, however, this would all be done through the API and not reaching into the database directly. Dataset level operations for pushing files into Galaxy, listing them ( and retrieving ldda's for use in things like the workflows api component) are supported for datasets in data libraries, but not in individual histories, yet. I'd imagine this should be forthcoming soon. In the meanwhile, at least for this, you might want to consider using a data library at least as an initial import destination from which you can do further work. The example_watch_folder.py in scripts/api has a more comprehensive example of doing programmatic execution on many datasets at once, as well as importing of those files from the filesystem into galaxy. You should also be able to use the same approach I used there in finding or creating a data library to grab a workflow by name, instead of having to figure out the id ahead of time.
Finally, can I generate an api key programatically as well? Not the end of the world, but it would be nice.
No, though I suppose you could hack something together if you wanted, since you do have access directly to the database and don't seem to be opposed to poking around in there. All you'd need is a user's id and whatever you want the key to be, toss that in the api_keys table making sure that the user doesn't already have one set. Hope this helps, thanks for exploring all this new ground! Dannon