Hi Gordon, Thanks for your assistance and the recommendations. Freezing postgres sounds like hell to me :-) abrt was filling the root directory indeed. So disabled it. I have done some exporting tests, and the behaviour is not consistent. 1. *size*: in general, it worked out for smaller datasets, and usually crashed on bigger ones (starting from 3 GB). So size is key? 2. But now I have found several histories of 4.5GB that I was able to export... So far for the size hypothesis. Another observation: when the export crashes, the corresponding webhandler process dies. So now I suspect something to be wrong with the datasets, but I am not able to trace something meaningful in the logs. I am not confident in turning on logging in Python yet, but apparently this happens with the module "logging" initiated like logging.getLogger( __name__ ). Cheers, Joachim Joachim Jacob Rijvisschestraat 120, 9052 Zwijnaarde Tel: +32 9 244.66.34 Bioinformatics Training and Services (BITS) http://www.bits.vib.be @bitsatvib On 03/25/2013 05:18 PM, Assaf Gordon wrote:
Hello Joachim,
Couple of things to check:
On Mar 25, 2013, at 10:01 AM, Joachim Jacob | VIB | wrote:
Hi,
About the exporting of history, which fails: 1. the preparation seems to work fine: meaning: choosing 'Export this history' in the History menu leads to a URL that reports initially that the export is still in progress.
2. when the export is finished, and I click the download link, the root partition fills and the browser displays "Error reading from remote server". A folder ccpp-2013-03-25-14:51:15-27045.new is created in the directory /var/spool/abrt, which fills the root partition. Something in your export is likely not finishing fine, but crashes instead (either the creation of the archive, or the download).
The folder "/var/spool/abrt/ccpp-XXXX" (and especially a file named "coredump") hints that the program crashed. "abrt" is a daemon (at least on Fedora) that monitors crashes and tries to keep all relevant information about the program which crashed (http://docs.fedoraproject.org/en-US/Fedora/13/html/Deployment_Guide/ch-abrt....).
So what might have happened, is that a program (galaxy's export_history.py or other) crashed during your export, and then "abrt" picked-up the pieces (storing a memory dump, for example), and then filled your disk.
The handler reports in its log: """ galaxy.jobs DEBUG 2013-03-25 14:38:33,322 (8318) Working directory for job is: /mnt/galaxydb/job_working_directory/008/8318 galaxy.jobs.handler DEBUG 2013-03-25 14:38:33,322 dispatching job 8318 to local runner galaxy.jobs.handler INFO 2013-03-25 14:38:33,368 (8318) Job dispatched galaxy.jobs.runners.local DEBUG 2013-03-25 14:38:33,432 Local runner: starting job 8318 galaxy.jobs.runners.local DEBUG 2013-03-25 14:38:33,572 executing: python /home/galaxy/galaxy-dist/lib/galaxy/tools/imp_exp/export_history.py -G /mnt/galaxytemp/tmpHAEokb/tmpQM6g_R /mnt/galaxytemp/tmpHAEokb/tmpeg7bYF /mnt/galaxytemp/tmpHAEokb/tmpPXJ245 /mnt/galaxydb/files/013/dataset_13993.dat galaxy.jobs.runners.local DEBUG 2013-03-25 14:41:29,420 execution finished: python /home/galaxy/galaxy-dist/lib/galaxy/tools/imp_exp/export_history.py -G /mnt/galaxytemp/tmpHAEokb/tmpQM6g_R /mnt/galaxytemp/tmpHAEokb/tmpeg7bYF /mnt/galaxytemp/tmpHAEokb/tmpPXJ245 /mnt/galaxydb/files/013/dataset_13993.dat galaxy.jobs DEBUG 2013-03-25 14:41:29,476 Tool did not define exit code or stdio handling; checking stderr for success galaxy.tools DEBUG 2013-03-25 14:41:29,530 Error opening galaxy.json file: [Errno 2] No such file or directory: '/mnt/galaxydb/job_working_directory/008/8318/galaxy.json' galaxy.jobs DEBUG 2013-03-25 14:41:29,555 job 8318 ended """
The system reports: """ Mar 25 14:51:26 galaxy abrt[16805]: Write error: No space left on device Mar 25 14:51:27 galaxy abrt[16805]: Error writing '/var/spool/abrt/ccpp-2013-03-25-14:51:15-27045.new/coredump' """
One thing to try: if you have galaxy keeping temporary files, try running the "export" command manually: === python /home/galaxy/galaxy-dist/lib/galaxy/tools/imp_exp/export_history.py -G /mnt/galaxytemp/tmpHAEokb/tmpQM6g_R /mnt/galaxytemp/tmpHAEokb/tmpeg7bYF /mnt/galaxytemp/tmpHAEokb/tmpPXJ245 /mnt/galaxydb/files/013/dataset_13993.dat ===
Another thing to try: modify "export_history.py", adding debug messages to track progress and whether it finishes or not.
And: check the "abrt" program's GUI, perhaps you'll see previous crashes that were stored successfully, providing more information about which program crashed.
As a general rule, it's best to keep the "/var" directory on a separate partition for production systems, exactly so that filling it up with junk wouldn't intervene with other programs. Even better, set each sub-directory of "/var" to a dedicated partition, so that filling up "/var/log" or "/var/spool" would not fill up "/var/lib/pgsql" and stop Postgres from working.
-gordon