Hi again, Just to let anyone that comes across this message to know that the issue turned out to be a problem with a specific version of the openafs client. The server is running Debian Wheezy and openafs-client was 1.6.1. Using the backported version 1.6.5 solved the issue. Regards, Renato Quoting Renato Alves on 18-10-2013 03:18:
Hi everyone,
I'm currently setting up galaxy to run on top of AFS using torque for handling jobs.
Everything is setup according to wiki documentation but I'm having a weird filesystem corruption problem.
The setup is the following:
machine A: runs galaxy. Galaxy home folder is on AFS. machine B: runs torque server and shares Galaxy's home folder.
When I launch a process via galaxy everything works as expected but the output file becomes corrupted with 4kb of leading zero bytes (file NC_010473.tabular). This corruption is reproducible at all times, regardless of file size. In the attached example, the original file is NC_010473.faa.
If I restart the openafs client or flush the AFS file cache the corruption goes away. However, if I re-run the same script created by galaxy through torque the corruption doesn't happen. Hence it only happens if launched via Galaxy.
I also tried both the DRMAA and the PBS modules but the corruption remained.
Does anyone know what could be the cause of this?
Thanks, Renato