Hello, everyone.
I'm working on a Condor job-runner module for Galaxy that can provide better integration than the existing drmaa module. I'm sure I'll have many questions as I work on this module, but I have just a few to start with:
* What versions of Python does Galaxy support (i.e. which newer features of Python do I need to avoid)?
* What is the status of the 'staged' cluster deployment? The wiki says it's not in active use or development. But our Condor cluster doesn't have a shared filesystem, so we need to stage data files to/from the execute nodes.
* The wiki also says that even with the staged cluster deployment, a shared directory (galaxy_dist/database/tmp by default) is required for some tools. These tools write temporary files on the Galaxy server and read them on the execute node. Does anyone know which tools those are? Are these files ever written after the job is handed to the batch system? Do these tools write files into galaxy_dist/database/tmp on the execute ndoes and read the files on the Galaxy server?
Thanks and regards, Jaime Frey UW-Madison Condor Team
On Jan 23, 2012, at 6:07 PM, Jaime Frey wrote:
Hello, everyone.
I'm working on a Condor job-runner module for Galaxy that can provide better integration than the existing drmaa module. I'm sure I'll have many questions as I work on this module, but I have just a few to start with:
- What versions of Python does Galaxy support (i.e. which newer features of Python do I need to avoid)?
Hi Jaime,
The current minimum is 2.5, and there are no plans to drop it any time soon (2.4 was dropped only a few months ago).
- What is the status of the 'staged' cluster deployment? The wiki says it's not in active use or development. But our Condor cluster doesn't have a shared filesystem, so we need to stage data files to/from the execute nodes.
You can see the code over in lib/galaxy/jobs/runners/pbs.py. I haven't used it in a couple of years, and there would definitely some potential problems as outlined below.
- The wiki also says that even with the staged cluster deployment, a shared directory (galaxy_dist/database/tmp by default) is required for some tools. These tools write temporary files on the Galaxy server and read them on the execute node. Does anyone know which tools those are? Are these files ever written after the job is handed to the batch system? Do these tools write files into galaxy_dist/database/tmp on the execute ndoes and read the files on the Galaxy server?
Unfortunately I don't have a list, but see the tools which use work_dir (such as tophat) and any that pass the extra_files_path attribute.
--nate
Thanks and regards, Jaime Frey UW-Madison Condor Team
Please keep all replies on the list by using "reply all" in your mail client. To manage your subscriptions to this and other Galaxy lists, please use the interface at:
galaxy-dev@lists.galaxyproject.org