Zero padding corruption using galaxy with torque on AFS
Hi everyone, I'm currently setting up galaxy to run on top of AFS using torque for handling jobs. Everything is setup according to wiki documentation but I'm having a weird filesystem corruption problem. The setup is the following: machine A: runs galaxy. Galaxy home folder is on AFS. machine B: runs torque server and shares Galaxy's home folder. When I launch a process via galaxy everything works as expected but the output file becomes corrupted with 4kb of leading zero bytes (file NC_010473.tabular). This corruption is reproducible at all times, regardless of file size. In the attached example, the original file is NC_010473.faa. If I restart the openafs client or flush the AFS file cache the corruption goes away. However, if I re-run the same script created by galaxy through torque the corruption doesn't happen. Hence it only happens if launched via Galaxy. I also tried both the DRMAA and the PBS modules but the corruption remained. Does anyone know what could be the cause of this? Thanks, Renato
Hi again, Just to let anyone that comes across this message to know that the issue turned out to be a problem with a specific version of the openafs client. The server is running Debian Wheezy and openafs-client was 1.6.1. Using the backported version 1.6.5 solved the issue. Regards, Renato Quoting Renato Alves on 18-10-2013 03:18:
Hi everyone,
I'm currently setting up galaxy to run on top of AFS using torque for handling jobs.
Everything is setup according to wiki documentation but I'm having a weird filesystem corruption problem.
The setup is the following:
machine A: runs galaxy. Galaxy home folder is on AFS. machine B: runs torque server and shares Galaxy's home folder.
When I launch a process via galaxy everything works as expected but the output file becomes corrupted with 4kb of leading zero bytes (file NC_010473.tabular). This corruption is reproducible at all times, regardless of file size. In the attached example, the original file is NC_010473.faa.
If I restart the openafs client or flush the AFS file cache the corruption goes away. However, if I re-run the same script created by galaxy through torque the corruption doesn't happen. Hence it only happens if launched via Galaxy.
I also tried both the DRMAA and the PBS modules but the corruption remained.
Does anyone know what could be the cause of this?
Thanks, Renato
participants (1)
-
Renato Alves