unpacking zip files during upload
I'm working on unpacking a zip file into multiple datasets. I think this is the code path Upload.py UploadToolAction upload_common.py: get_uploaded_datesets new_upload new_history_upload or new_library_upload Then a job gets spooled Which calles add_file in data_source/upload.py And does the expansion of the zip I can unpack the zip and create files in the dataset's path there. But I don't know how to create more dataset associations, and I'm not sure that it makes sense to create datasets on the fly in data_source/upload.py . Should I pass some information along with data_source/upload.py about how to create dataset object and associate them with library/history associations? Or maybe I can pass in some kind of a callback that can handle the dataset expansion? (I'm pretty new to python, but it seems similar to ruby) I thought about a composite dataset, but that seems like overloading that concept. Really the files I'm thinking about uplaoding are 8 independent BAMs or fastqs or whatever – not a set of files that are related to each other. Any suggestions? Brad -- Brad Langhorst New England Biolabs langhorst@neb.com
On Dec 20, 2011, at 6:16 PM, Langhorst, Brad wrote:
I'm working on unpacking a zip file into multiple datasets.
I think this is the code path Upload.py UploadToolAction
upload_common.py: get_uploaded_datesets new_upload new_history_upload or new_library_upload
Then a job gets spooled Which calles add_file in data_source/upload.py
And does the expansion of the zip
I can unpack the zip and create files in the dataset's path there.
But I don't know how to create more dataset associations, and I'm not sure that it makes sense to create datasets on the fly in data_source/upload.py .
Should I pass some information along with data_source/upload.py about how to create dataset object and associate them with library/history associations? Or maybe I can pass in some kind of a callback that can handle the dataset expansion? (I'm pretty new to python, but it seems similar to ruby)
Hey Brad, I was working on this a year or two ago and stopped working on the multi-file zip support. I forget all of the details, but I think the hangup had to do with creating associations when the tool runs, as you've found. This should most properly be done in the UploadToolAction, but then all of the dataset ids/filenames/etc need to be passed to the upload tool, and then have these datasets map to the right files when tools/data_source/upload.py expands the archive. It may be possible to use the method at the bottom of this page instead: http://wiki.g2.bx.psu.edu/Admin/Tools/Multiple%20Output%20Files
I thought about a composite dataset, but that seems like overloading that concept. Really the files I'm thinking about uplaoding are 8 independent BAMs or fastqs or whatever – not a set of files that are related to each other.
Composite would not be the right concept, since these should be considered unrelated files. Thanks for working on this! --nate
Any suggestions?
Brad -- Brad Langhorst New England Biolabs langhorst@neb.com
___________________________________________________________ Please keep all replies on the list by using "reply all" in your mail client. To manage your subscriptions to this and other Galaxy lists, please use the interface at:
participants (2)
-
Langhorst, Brad
-
Nate Coraor