Corner case in task splitter - merging zero files - galaxy-dev

18 Oct 2012

      Hi Scott,

Following some failing hard drives, I'm rebuilding our Galaxy server.
Something isn't quite right with our cluster integration yet, but it has
exposed a problem in Galaxy's handling of task splitting - it can
sometimes attempt to merge zero files.

Here is my fix for the BLAST XML format (now in the ToolShed),
https://bitbucket.org/peterjc/galaxy-central/changeset/5cb6411bad19802ba4001...

Here's an example using the text format:

galaxy.jobs.splitters.multi ERROR 2012-10-18 16:26:21,330 Error merging files
Traceback (most recent call last):
  File "/mnt/galaxy/galaxy-central/lib/galaxy/jobs/splitters/multi.py",
line 133, in do_merge
    output_type.merge(output_files, output_file_name)
  File "/mnt/galaxy/galaxy-central/lib/galaxy/datatypes/data.py", line
545, in merge
    raise Exception('Result %s from %s' % (result, cmd))
Exception: Result 2 from cat  >
/mnt/galaxy/galaxy-central/database/files/000/dataset_304.dat

The problem obviously is that while "cat file1 ... fileN > merged" will
work fine for one or more files, with no files it sits waiting for stdin
(and from a user perspective stalls).

This logic error is in lib/galaxy/datatypes/data.py method merge,
which could either treat zero files as an error, or a no-op:

        if len(split_files) == 1:
            cmd = 'mv -f %s %s' % ( split_files[0], output_file )
        else:
            cmd = 'cat %s > %s' % ( ' '.join(split_files), output_file )
        result = os.system(cmd)

I think this should be something like this:

        if not split_files:
            raise Exception('Asked to merge zero files')
        elif len(split_files) == 1:
            cmd = 'mv -f %s %s' % ( split_files[0], output_file )
        else:
            cmd = 'cat %s > %s' % ( ' '.join(split_files), output_file )
        result = os.system(cmd)

It might also make sense to check for zero files in the code which
calls the merge, i.e. lib/galaxy/jobs/splitters/multi.py function do_merge
I'm still investigating upstream how this comes about, one clue:

galaxy.jobs.runners.drmaa DEBUG 2012-10-18 16:25:01,930 (273/510)
state change: job is running
galaxy.jobs.runners.drmaa DEBUG 2012-10-18 16:25:03,040 (273/510)
state change: job finished, but failed
galaxy.jobs.runners.drmaa DEBUG 2012-10-18 16:25:03,074 Job output not
returned from cluster
galaxy.jobs DEBUG 2012-10-18 16:25:03,074 task 641 for job 273 ended;
exit code: 0
galaxy.jobs DEBUG 2012-10-18 16:25:03,148 task 641 ended
galaxy.jobs.runners.tasks DEBUG 2012-10-18 16:25:05,169 execution
finished - beginning merge: tblastx -query
"/mnt/galaxy/galaxy-central/database/files/000/dataset_127.dat"   -db
"/var/local/blast/ncbi/nt" -query_gencode 2 -evalue 0.001 -out
/mnt/galaxy/galaxy-central/database/files/000/dataset_304.dat
-outfmt 0 -num_threads 8
galaxy.jobs.splitters.multi DEBUG 2012-10-18 16:25:05,181 files []

If you would prefer that small suggestion as a pull request, let me know.

Regards,

Peter

Corner case in task splitter - merging zero files

Peter Cock

Scott McManus

Peter Cock

Scott McManus

Scott McManus

Peter Cock

tags

participants (2)