Re: [galaxy-dev] Exporting histories fails: no space left on device

26 Mar 2013

      Hi  Gordon,

Thanks for your assistance and the recommendations. Freezing postgres 
sounds like hell to me :-)

abrt was filling the root directory indeed. So disabled it.

I have done some exporting tests, and the behaviour is not consistent.

1. *size*: in general, it worked out for smaller datasets, and usually 
crashed on bigger ones (starting from 3 GB). So size is key?
2. But now I have found several histories of 4.5GB that I was able to 
export... So far for the size hypothesis.

Another observation: when the export crashes, the corresponding 
webhandler process dies.

So now I suspect something to be wrong with the datasets, but I am not 
able to trace something meaningful in the logs.  I am not confident in 
turning on logging in Python yet, but apparently this happens with the 
module "logging" initiated like logging.getLogger( __name__ ).

Cheers,
Joachim

Joachim Jacob

Rijvisschestraat 120, 9052 Zwijnaarde
Tel: +32 9 244.66.34
Bioinformatics Training and Services (BITS)
http://www.bits.vib.be
@bitsatvib

On 03/25/2013 05:18 PM, Assaf Gordon wrote:
...
Hello Joachim,
Couple of things to check:
...
On Mar 25, 2013, at 10:01 AM, Joachim Jacob | VIB | wrote:
...
Hi,
About the exporting of history, which fails:
1. the preparation seems to work fine: meaning: choosing 'Export this history' in the History menu leads to a URL that reports initially that the export is still in progress.
2. when the export is finished, and I click the download link, the  root partition fills and the browser displays "Error reading from remote server". A folder ccpp-2013-03-25-14:51:15-27045.new is created in the directory /var/spool/abrt, which fills the root partition.
Something in your export is likely not finishing fine, but crashes instead (either the creation of the archive, or the download).
The folder "/var/spool/abrt/ccpp-XXXX" (and especially a file named "coredump") hints that the program crashed.
"abrt" is a daemon (at least on Fedora) that monitors crashes and tries to keep all relevant information about the program which crashed (http://docs.fedoraproject.org/en-US/Fedora/13/html/Deployment_Guide/ch-abrt....).
So what might have happened, is that a program (galaxy's export_history.py or other) crashed during your export, and then "abrt" picked-up the pieces (storing a memory dump, for example), and then filled your disk.
...
...
The handler reports in its log:
"""
galaxy.jobs DEBUG 2013-03-25 14:38:33,322 (8318) Working directory for job is: /mnt/galaxydb/job_working_directory/008/8318
galaxy.jobs.handler DEBUG 2013-03-25 14:38:33,322 dispatching job 8318 to local runner
galaxy.jobs.handler INFO 2013-03-25 14:38:33,368 (8318) Job dispatched
galaxy.jobs.runners.local DEBUG 2013-03-25 14:38:33,432 Local runner: starting job 8318
galaxy.jobs.runners.local DEBUG 2013-03-25 14:38:33,572 executing: python /home/galaxy/galaxy-dist/lib/galaxy/tools/imp_exp/export_history.py -G /mnt/galaxytemp/tmpHAEokb/tmpQM6g_R /mnt/galaxytemp/tmpHAEokb/tmpeg7bYF /mnt/galaxytemp/tmpHAEokb/tmpPXJ245 /mnt/galaxydb/files/013/dataset_13993.dat
galaxy.jobs.runners.local DEBUG 2013-03-25 14:41:29,420 execution finished: python /home/galaxy/galaxy-dist/lib/galaxy/tools/imp_exp/export_history.py -G /mnt/galaxytemp/tmpHAEokb/tmpQM6g_R /mnt/galaxytemp/tmpHAEokb/tmpeg7bYF /mnt/galaxytemp/tmpHAEokb/tmpPXJ245 /mnt/galaxydb/files/013/dataset_13993.dat
galaxy.jobs DEBUG 2013-03-25 14:41:29,476 Tool did not define exit code or stdio handling; checking stderr for success
galaxy.tools DEBUG 2013-03-25 14:41:29,530 Error opening galaxy.json file: [Errno 2] No such file or directory: '/mnt/galaxydb/job_working_directory/008/8318/galaxy.json'
galaxy.jobs DEBUG 2013-03-25 14:41:29,555 job 8318 ended
"""
The system reports:
"""
Mar 25 14:51:26 galaxy abrt[16805]: Write error: No space left on device
Mar 25 14:51:27 galaxy abrt[16805]: Error writing '/var/spool/abrt/ccpp-2013-03-25-14:51:15-27045.new/coredump'
"""
One thing to try: if you have galaxy keeping temporary files, try running the "export" command manually:
===
python /home/galaxy/galaxy-dist/lib/galaxy/tools/imp_exp/export_history.py -G /mnt/galaxytemp/tmpHAEokb/tmpQM6g_R /mnt/galaxytemp/tmpHAEokb/tmpeg7bYF /mnt/galaxytemp/tmpHAEokb/tmpPXJ245 /mnt/galaxydb/files/013/dataset_13993.dat
===
Another thing to try: modify "export_history.py", adding debug messages to track progress and whether it finishes or not.
And: check the "abrt" program's GUI, perhaps you'll see previous crashes that were stored successfully, providing more information about which program crashed.
As a general rule, it's best to keep the "/var" directory on a separate partition for production systems, exactly so that filling it up with junk wouldn't intervene with other programs.
Even better, set each sub-directory of "/var" to a dedicated partition, so that filling up "/var/log" or "/var/spool" would not fill up "/var/lib/pgsql" and stop Postgres from working.
-gordon