Re: [galaxy-dev] FTP upload - symlink to uploaded data
Joachim, Nate, Leon Mei pointed me to a mailing list post of August 2012 where you two discussed a problem with uploads to Galaxy filling up /tmp. I think I have traced this down now after we suffered from this too several times. There are a number of places where temporary files are configurable in galaxy, but there is (at least) one place that uses the Python default directory (can be set with TMPDIR or some other envvars, but if you don't it is often /tmp). The "unconfigurable" place is tools/data_source/upload.py, where the code reads: if dataset.type == 'url': try: page = urllib.urlopen( dataset.path ) #page will be .close()ed by sniff methods temp_name, dataset.is_multi_byte = sniff.stream_to_file( page, prefix='url_paste', source_encoding=util.get_charset_from_http_headers( page.headers ) ) except Exception, e: file_err( 'Unable to fetch %s\n%s' % ( dataset.path, str( e ) ), dataset, json_file ) return dataset.path = temp_name sniff.stream_to_file uses the tempfile module, and since there is no "dir=" in the argument list to this call, the temporary file is made in /tmp. The central solution for the main galaxy code is in lib/galaxy/config.py: self.new_file_path = resolve_path( kwargs.get( "new_file_path", "database/tmp" ), self.root ) tempfile.tempdir = self.new_file_path But this assignment to "tempdir" does not help in this case because upload.py is a tool? It would be nice to fix this, which we can obviously do ourselves for our andromeda deployment, but it would be better to do it centrally. Regards, Rob -- Rob W.W. Hooft Chief Technology Officer BioAssist, Netherlands Bioinformatics Centre http://www.nbic.nl/ Skype: robhooft GSM: +31 6 27034319
Hi Rob, Indeed, I had/have some difficulties with setting temporary directories. The problem was that FTP uploaded data was first copied to TMPDIR prior to being put in the database directory. My solution: I extended the /tmp partition to several GB's, by mounting a bigger device over it. In addition, I have a large network share, which is mounted on /mnt/galaxytemp. The __new_file_path__ points to here. A bit messy indeed. I just had another discussion about the temporary directories with John Chilton and Jeremy Goecks, which you can read here: https://bitbucket.org/galaxy/galaxy-central/pull-request/139/letting-cuffdif... From what I understood, __new_file_path__ is going to be phased out, in favour of __job_working_directory__. But apparently, the job_working_directory is not a temporary directory (in my case, it contains symlinks from the job_working_directory to database/files. In addition, job_working_directory is default part of the database directory of Galaxy.) The suggestion is to set TMPDIR env variable to a directory you specify. I have one file in /home/galaxy that contains the environment settings, and which gets sourced in the init script that launches Galaxy. Cheers, Joachim Jacob Rijvisschestraat 120, 9052 Zwijnaarde Tel: +32 9 244.66.34 Bioinformatics Training and Services (BITS) http://www.bits.vib.be @bitsatvib On Wed 27 Mar 2013 08:44:08 PM CET, Rob Hooft wrote:
Joachim, Nate,
Leon Mei pointed me to a mailing list post of August 2012 where you two discussed a problem with uploads to Galaxy filling up /tmp. I think I have traced this down now after we suffered from this too several times.
There are a number of places where temporary files are configurable in galaxy, but there is (at least) one place that uses the Python default directory (can be set with TMPDIR or some other envvars, but if you don't it is often /tmp). The "unconfigurable" place is tools/data_source/upload.py, where the code reads:
if dataset.type == 'url': try: page = urllib.urlopen( dataset.path ) #page will be .close()ed by sniff methods temp_name, dataset.is_multi_byte = sniff.stream_to_file( page, prefix='url_paste', source_encoding=util.get_charset_from_http_headers( page.headers ) ) except Exception, e: file_err( 'Unable to fetch %s\n%s' % ( dataset.path, str( e ) ), dataset, json_file ) return dataset.path = temp_name
sniff.stream_to_file uses the tempfile module, and since there is no "dir=" in the argument list to this call, the temporary file is made in /tmp. The central solution for the main galaxy code is in lib/galaxy/config.py:
self.new_file_path = resolve_path( kwargs.get( "new_file_path", "database/tmp" ), self.root ) tempfile.tempdir = self.new_file_path
But this assignment to "tempdir" does not help in this case because upload.py is a tool?
It would be nice to fix this, which we can obviously do ourselves for our andromeda deployment, but it would be better to do it centrally.
Regards,
Rob
-- Rob W.W. Hooft Chief Technology Officer BioAssist, Netherlands Bioinformatics Centre http://www.nbic.nl/ Skype: robhooft GSM: +31 6 27034319
participants (2)
-
Joachim Jacob | VIB |
-
Rob Hooft