Glen Beane wrote:
On Feb 9, 2011, at 9:44 AM, Glen Beane wrote:
I've been doing some testing with a Galaxy instance running on my laptop for some tools we are developing. I am uploading a file into Galaxy from a URL to use as test input (~1.5GB tabular) I can download this file to my laptop in ~30 seconds with wget, while if I pull from the same URL into Galaxy it takes about 30 minutes. I set the file type so Galaxy did not have to auto-detect.
This seems very slow considering it only takes about 30 seconds to get the file over the network and write it to disk. What is Galaxy doing that makes this file upload so slow? We also tried defining our own datatype (data, not tabular with the thought that maybe Galaxy tried to examine tabular files), but it is still very slow. In production our input files will grow to be much larger than this (although we'll probably abandon tabular for a more compact binary format by then).
So no insight as to why a 1.5GB file takes 60 times as long to load into galaxy via URL as it takes to download the file from the same URL outside of Galaxy? I'm assuming it has to do with detecting Metadata, since changing the file type from our custom tabular type to the galaxy tabular type causes a set metadata job that takes at least 20 minutes (I didn't time it). However, I changed our data type from tabular to "data" hoping Galaxy would just ignore the file contents and it still takes 30 minutes to load into Galaxy.
We haven't updated to the latest galaxy-dist (it is on our todo list to synch up), but this seems like it takes much longer than it should and is a problem with the implementation
Hi Glen, Sorry, I haven't had a chance to address your question yet. The reason is most likely metadata as you have surmised. Do you have: set_metadata_externally = True Set in universe_wsgi.ini? Also, there are some recent changes in the newest dist release which limit the number of lines checked for metadata that should make this process much faster. --nate
-- Glen L. Beane Software Engineer The Jackson Laboratory Phone (207) 288-6153
_______________________________________________ galaxy-dev mailing list galaxy-dev@lists.bx.psu.edu http://lists.bx.psu.edu/listinfo/galaxy-dev