Uploading large files to history through API (some random thoughts)
I went down something of a rabbit hole last night, I thought that uploading files through the API using multipart/form-data worked. It is only at the very end of my adventure that I realized it only works for libraries uploads. I think it would be great to get that working to histories as well. I cannot spend much more time on this right now (I thought I only needed to spend an hour to implement the client side stuff), but I thought I would post the progress I made here in case someone wants to take up the fight someday: The first thing is that when issuing multipart/form-data requests, inputs are not currently being deserialized from JSON (so for instance in the tools api code the "inputs" variable would be the string containing the json, not the python dictionary). To bring multipart/form-data requests in line with other content requests I modified this in ilb/galaxy/web/framework/__init__.py payload = kwargs.copy() named_args, _, _, _ = inspect.getargspec(func) for arg in named_args: payload.pop(arg, None) with this: payload = kwargs.copy() named_args, _, _, _ = inspect.getargspec(func) for arg in named_args: payload.pop(arg, None) for k, v in payload.iteritems(): if isinstance(v, (str, unicode)): try: payload[k] = simplejson.loads(v) except: # may not actually be json, just continue pass payload = util.recursively_stringify_dictionary_keys( payload ) One could also imagine doing this replacement in the tools api controller directly on payload['input'] instead. After that change, files still weren't being matched up with inputs properly: This debug statement I added to upload_common.py, demonstrates that file_data is None. galaxy.tools.actions.upload_common INFO 2012-11-27 00:22:17,585 Uploaded datasets is {'NAME': u'galxtest694734465387762969.txt', 'file_data': None, 'space_to_tab': None, 'url_paste': None, '__index__': 0, 'ftp_files': None} To address this, the files_*|file_data parameters need to be moved inside of the inputs dict. That means in lib/galaxy/webapps/galaxy/api/tools.py, changing this: inputs = payload[ 'inputs' ] To this: inputs = payload[ 'inputs' ] for k, v in payload.iteritems(): if k.startswith("files_"): inputs[k] = v Then the debug line becomes this: galaxy.tools.actions.upload_common INFO 2012-11-27 00:33:15,168 Uploaded datasets is {'NAME': u'galxtest2484272839208214846.txt', 'file_data': FieldStorage('files_0|file_data', u'galxtest2484272839208214846.txt'), 'space_to_tab': None, 'url_paste': None, '__index__': 0, 'ftp_files': None} Which matches pretty well with the same line coming from a web browser request: galaxy.tools.actions.upload_common INFO 2012-11-27 00:23:57,590 Uploaded datasets is {'NAME': u'', 'file_data': FieldStorage('files_0|file_data', u'second_step.png'), 'space_to_tab': None, 'url_paste': u'', '__index__': 0, 'ftp_files': None} These changes aside I still wasn't it sill didn't work, that is why this is a rambling e-mail and not a pull request. I got this exception http://pastebin.com/hhs1pjtP. I guess something to do with the session handling stuff being different between API calls and normal web calls. Anyway, I've inspected the actual requests I was generating and I think they are reasonable, Galaxy just needs to be augmented to accept them :). If someone does land up taking a look at this, I have committed my test case to blend4j so it can be used to really quickly test such a client request (requires git, Java, and maven): % git checkout git@github.com:jmchilton/blend4j.git % cd blend4j % mvn test -Dtest=ToolsTest -Dtest.galaxy.instance=http://localhost:8080/ -Dtest.galaxy.key=<testapikey> Thanks, -John
The pull request for this patch is still on the queue. Is anything happening with this? Kyle On Tue, Nov 27, 2012 at 8:56 AM, John Chilton <chilton@msi.umn.edu> wrote:
I went down something of a rabbit hole last night, I thought that uploading files through the API using multipart/form-data worked. It is only at the very end of my adventure that I realized it only works for libraries uploads. I think it would be great to get that working to histories as well. I cannot spend much more time on this right now (I thought I only needed to spend an hour to implement the client side stuff), but I thought I would post the progress I made here in case someone wants to take up the fight someday:
The first thing is that when issuing multipart/form-data requests, inputs are not currently being deserialized from JSON (so for instance in the tools api code the "inputs" variable would be the string containing the json, not the python dictionary).
To bring multipart/form-data requests in line with other content requests I modified this in ilb/galaxy/web/framework/__init__.py
payload = kwargs.copy() named_args, _, _, _ = inspect.getargspec(func) for arg in named_args: payload.pop(arg, None)
with this:
payload = kwargs.copy() named_args, _, _, _ = inspect.getargspec(func) for arg in named_args: payload.pop(arg, None) for k, v in payload.iteritems(): if isinstance(v, (str, unicode)): try: payload[k] = simplejson.loads(v) except: # may not actually be json, just continue pass payload = util.recursively_stringify_dictionary_keys( payload )
One could also imagine doing this replacement in the tools api controller directly on payload['input'] instead.
After that change, files still weren't being matched up with inputs properly:
This debug statement I added to upload_common.py, demonstrates that file_data is None.
galaxy.tools.actions.upload_common INFO 2012-11-27 00:22:17,585 Uploaded datasets is {'NAME': u'galxtest694734465387762969.txt', 'file_data': None, 'space_to_tab': None, 'url_paste': None, '__index__': 0, 'ftp_files': None}
To address this, the files_*|file_data parameters need to be moved inside of the inputs dict.
That means in lib/galaxy/webapps/galaxy/api/tools.py, changing this:
inputs = payload[ 'inputs' ]
To this:
inputs = payload[ 'inputs' ] for k, v in payload.iteritems(): if k.startswith("files_"): inputs[k] = v
Then the debug line becomes this:
galaxy.tools.actions.upload_common INFO 2012-11-27 00:33:15,168 Uploaded datasets is {'NAME': u'galxtest2484272839208214846.txt', 'file_data': FieldStorage('files_0|file_data', u'galxtest2484272839208214846.txt'), 'space_to_tab': None, 'url_paste': None, '__index__': 0, 'ftp_files': None}
Which matches pretty well with the same line coming from a web browser request:
galaxy.tools.actions.upload_common INFO 2012-11-27 00:23:57,590 Uploaded datasets is {'NAME': u'', 'file_data': FieldStorage('files_0|file_data', u'second_step.png'), 'space_to_tab': None, 'url_paste': u'', '__index__': 0, 'ftp_files': None}
These changes aside I still wasn't it sill didn't work, that is why this is a rambling e-mail and not a pull request. I got this exception http://pastebin.com/hhs1pjtP. I guess something to do with the session handling stuff being different between API calls and normal web calls.
Anyway, I've inspected the actual requests I was generating and I think they are reasonable, Galaxy just needs to be augmented to accept them :). If someone does land up taking a look at this, I have committed my test case to blend4j so it can be used to really quickly test such a client request (requires git, Java, and maven):
% git checkout git@github.com:jmchilton/blend4j.git % cd blend4j % mvn test -Dtest=ToolsTest -Dtest.galaxy.instance=http://localhost:8080/ -Dtest.galaxy.key=<testapikey>
Thanks, -John ___________________________________________________________ Please keep all replies on the list by using "reply all" in your mail client. To manage your subscriptions to this and other Galaxy lists, please use the interface at:
participants (2)
-
John Chilton
-
Kyle Ellrott